WO2016021412A1 - Coding device and method, decoding device and method, and program - Google Patents

Coding device and method, decoding device and method, and program Download PDF

Info

Publication number
WO2016021412A1
WO2016021412A1 PCT/JP2015/070924 JP2015070924W WO2016021412A1 WO 2016021412 A1 WO2016021412 A1 WO 2016021412A1 JP 2015070924 W JP2015070924 W JP 2015070924W WO 2016021412 A1 WO2016021412 A1 WO 2016021412A1
Authority
WO
WIPO (PCT)
Prior art keywords
spectrum
extension
frequency
low
band
Prior art date
Application number
PCT/JP2015/070924
Other languages
French (fr)
Japanese (ja)
Inventor
修一郎 錦織
鈴木 志朗
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Priority to EP15830713.2A priority Critical patent/EP3179476B1/en
Priority to EP19199364.1A priority patent/EP3608910B1/en
Priority to US15/500,253 priority patent/US10049677B2/en
Priority to CN201580041640.XA priority patent/CN106663449B/en
Publication of WO2016021412A1 publication Critical patent/WO2016021412A1/en
Priority to US16/037,574 priority patent/US10510353B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • G10L21/0388Details of processing therefor
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present technology relates to an encoding apparatus and method, a decoding apparatus and method, and a program, and in particular, an encoding apparatus and method, a decoding apparatus and method, and a decoding apparatus that can obtain high-quality sound even in a low resource environment, and Regarding the program.
  • Patent Document 1 an encoding technique that incorporates the concept of bandwidth expansion for audio signals is known (see, for example, Patent Document 1 and Patent Document 2).
  • a time-series signal input as an audio signal is band-divided into a low-frequency component and a high-frequency component, and normal encoding is performed on the low-frequency signal, and the low-frequency signal and The relationship of the high frequency signal, the characteristics of the high frequency signal, and the like are transmitted as additional information.
  • the low-frequency signal and the additional information are used to generate an extended-band signal, and the low-frequency signal and the extended-band signal are combined, Bandwidth expansion is realized.
  • the low-frequency signal is divided into a plurality of bands by a band division filter, and the divided low-frequency signal and additional information are used.
  • an extended band signal is generated.
  • the low-band signal and the extension band signal are synthesized by the band synthesis filter, and a band-expanded time-series signal is obtained.
  • the band division filter and the band synthesis filter are used in this way, the principle delay from the encoding to the decoding of the signal is increased by the band division and band synthesis filter processing. If it does so, the response speed from the input of an audio
  • filter processing such as band division and band synthesis using a filter bank is required, which greatly increases the amount of processing and memory usage, and decoding is possible in low-resource environments such as embedded devices. It was difficult to install the device.
  • the spectrum obtained by MDCT Modem Discrete Cosine Transform
  • baseband low frequency side
  • high frequency side extended band
  • normal encoding is performed for baseband signals.
  • the relationship between the spectrum of the baseband and the extension band, the characteristics of the extension band spectrum, and the like are transmitted as additional information.
  • the baseband spectrum and the additional information are used to generate an extension band spectrum, and the baseband spectrum and the extension band spectrum are synthesized to generate a full band spectrum. Further, an IMDCT (Inverse Modulated Discrete Cosine Transform) is performed on the obtained spectrum of the entire band, whereby the spectrum of the entire band is converted into a time series signal (time signal).
  • IMDCT Inverse Modulated Discrete Cosine Transform
  • each frequency bin of the spectrum obtained by MDCT (hereinafter also referred to as MDCT spectrum) is a value in which both the amplitude component and the phase component are woven. Therefore, in the technology that performs band extension in the frequency domain, if the amplitude of the spectrum in the extension band is finely adjusted using the MDCT spectrum at the time of decoding, the phase component of each spectrum and the mutual phase relationship between each spectrum are greatly destroyed. .
  • the audio signal to be encoded and decoded is a signal such as a noisy musical tone or a human voice
  • the audio signal does not significantly deteriorate in sound quality.
  • the audio signal is an audio signal in which energy is concentrated at a specific frequency such as a single musical instrument or sound effect, that is, a signal with high tonality
  • the energy that should have been concentrated at the specific frequency is decoded. Will spread into the spectrum of the surrounding frequency. If it does so, the audio
  • the technology for performing band expansion in the frequency domain does not require band division or band synthesis for a time-series signal, so that voice encoding and decoding can be performed even in a low resource environment without causing a delay.
  • voice encoding and decoding can be performed even in a low resource environment without causing a delay.
  • high-quality sound could not be obtained.
  • the present technology has been made in view of such a situation, and is capable of obtaining high-quality sound even in a low-resource environment.
  • the decoding device includes a low-frequency spectrum and a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low-frequency band, or a plurality of components constituting the extension band
  • An acquisition unit that acquires an extension coefficient for each band; a generation unit that generates the extension spectrum based on the single extension coefficient or the extension coefficient for each of the plurality of bands; and the low-frequency spectrum; And a synthesis unit that synthesizes the extended spectrum.
  • the generation unit can generate the extended spectrum based on the low frequency spectrum and the extension coefficient.
  • the generation unit can generate the extended spectrum by adjusting the level of the spectrum obtained from the low-frequency spectrum based on the extension coefficient.
  • the generation unit When generating the extended spectrum based on the single extension coefficient, the generation unit adjusts the level of the entire extension band of the spectrum based on the extension coefficient, and When generating the extended spectrum based on an extension coefficient, the level of the band of the spectrum can be adjusted based on the extension coefficient of the band.
  • the generation unit can generate the extended spectrum by adjusting a predetermined noise level based on the extension coefficient.
  • the value of the low frequency spectrum can be determined by the amplitude component and the phase component of the original time series signal.
  • the low frequency spectrum can be an MDCT spectrum.
  • the decoding method or program according to the first aspect of the present technology configures a low-band spectrum and a single extension coefficient for the extension band to obtain an extension spectrum of an extension band different from the low band, or the extension band Obtaining an extension coefficient for each of a plurality of bands, generating the extension spectrum based on the single extension coefficient or the extension coefficient for each of the plurality of bands, and adding the low-frequency spectrum and the extension spectrum to each other. Synthesizing.
  • the extension spectrum is generated based on the single extension coefficient or the extension coefficient for each of the plurality of bands, and the low-frequency spectrum and the extension spectrum are combined.
  • the encoding device includes a feature amount extraction unit that extracts a feature amount from a spectrum obtained by orthogonally transforming a time-series signal, and a low-frequency region of the spectrum according to the feature amount.
  • a calculation unit for calculating, based on the spectrum, a single extension coefficient for the extension band for obtaining an extension spectrum of a different extension band, or an extension coefficient for each of a plurality of bands constituting the extension band;
  • a multiplexing unit that multiplexes a low-frequency spectrum that is a low-frequency component and the extension coefficient to generate a code string.
  • the feature quantity can be information indicating the tonality of the spectrum.
  • the calculation unit can calculate the single extension coefficient when the tonality of the spectrum is high, and can calculate the extension coefficient for each of the plurality of bands when the tonality of the spectrum is low.
  • the calculation unit can calculate the ratio between the average amplitude of the extension band of the spectrum and the average amplitude of the low band spectrum as the extension coefficient.
  • the calculation means can calculate envelope information of the extension band of the spectrum as the extension coefficient when the low-frequency tonality of the spectrum is high and the tonality of the extension band of the spectrum is low.
  • the value of the spectrum can be determined by the amplitude component and phase component of the time series signal.
  • the orthogonal transform can be MDCT.
  • the encoding method or program according to the second aspect of the present technology extracts a feature amount from a spectrum obtained by orthogonally transforming a time-series signal, and expands differently from the low band of the spectrum according to the feature amount.
  • a single extension coefficient for the extension band for obtaining an extension spectrum of a band, or an extension coefficient for each of a plurality of bands constituting the extension band is calculated based on the spectrum, and a low band component of the spectrum is calculated.
  • the method includes a step of multiplexing a band spectrum and the extension coefficient to generate a code string.
  • a feature amount is extracted from a spectrum obtained by orthogonally transforming a time-series signal, and an extension spectrum having an extension band different from the low band of the spectrum is determined according to the feature amount.
  • a single extension coefficient for the extension band to be obtained or an extension coefficient for each of a plurality of bands constituting the extension band is calculated based on the spectrum, and a low-frequency spectrum that is a low-frequency component of the spectrum;
  • the extension coefficient is multiplexed and a code string is generated.
  • high-quality sound can be obtained even in a low resource environment.
  • FIG. 1 is a diagram illustrating a configuration example of an embodiment of an encoding device to which the present technology is applied.
  • the encoding apparatus 11 shown in FIG. 1 includes an MDCT unit 21, a spectrum quantization unit 22, a low frequency feature quantity extraction unit 23, a high frequency feature quantity extraction unit 24, a spectrum characteristic determination unit 25, an extension coefficient calculation unit 26, an extension coefficient quantum. And a multiplexing unit 28.
  • the MDCT unit 21 is supplied with an input signal that is a time-series signal having a sampling frequency Fs [kHz], for example, as a speech signal to be encoded.
  • the MDCT unit 21 performs, for example, MDCT as orthogonal transform on the supplied input signal, and obtains a spectrum from the frequency Dc [kHz], which is a DC component, to a frequency Fs / 2 that is half the sampling frequency Fs.
  • MDCT as an example of orthogonal transform
  • the spectrum value obtained by orthogonal transform may be a value in which both amplitude component and phase component are woven.
  • any conversion is not limited to MDCT.
  • the band expansion is performed on the decoding side in order to further improve the encoding efficiency.
  • the spectrum obtained by orthogonal transformation in the MDCT unit 21 is divided into a low-frequency spectrum, a high-frequency spectrum, and a loss spectrum.
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • the component from the frequency Dc, which is the direct current component, to the upper limit frequency Fb [kHz] in the entire spectrum is the low frequency spectrum, and when the input signal is encoded, normal encoding is performed on the low frequency spectrum. Done.
  • the component from the upper limit frequency Fb to the frequency Fc in the whole spectrum is a high frequency spectrum.
  • the high-frequency spectrum is not encoded, but at the time of decoding, the low-frequency spectrum and an extension coefficient that is additional information described later are used to generate a pseudo high-frequency spectrum (hereinafter referred to as an extended spectrum). (Also referred to as a spectrum) is generated, and band extension is realized. That is, at the time of decoding, the frequency band from the upper limit frequency Fb to the frequency Fc is set as an extension band that is a target of band extension.
  • the portion from the frequency Fc to the frequency Fs / 2 in the whole spectrum is regarded as a loss spectrum and is lost.
  • the band from the frequency Dc to the upper limit frequency Fb is referred to as a low band, and the band from the upper limit frequency Fb to the frequency Fc is referred to as a high band.
  • a band from the frequency Fc to the frequency Fs / 2 is referred to as a loss band.
  • the low frequency component is encoded with the input signal, and the high frequency component is generated by band expansion at the time of decoding.
  • the MDCT unit 21 performs MDCT on the input signal, and converts the low band spectrum of the spectrum of the entire band obtained as a result of the spectrum quantization unit 22 and the low band feature amount extraction unit 23. And the high frequency spectrum is supplied to the high frequency feature amount extraction unit 24.
  • the spectrum quantization unit 22 quantizes the low frequency spectrum supplied from the MDCT unit 21 and supplies the quantized low frequency spectrum obtained as a result to the multiplexing unit 28.
  • the low-frequency feature quantity extraction unit 23 extracts a feature quantity (hereinafter also referred to as a low-frequency spectrum feature quantity) from the low-frequency spectrum supplied from the MDCT unit 21 and supplies the extracted feature quantity to the spectral characteristic determination unit 25 and the low-frequency spectrum. Is supplied to the expansion coefficient calculation unit 26.
  • a feature quantity hereinafter also referred to as a low-frequency spectrum feature quantity
  • the high frequency feature quantity extraction unit 24 extracts a feature quantity (hereinafter also referred to as a high frequency spectrum feature quantity) from the high frequency spectrum supplied from the MDCT unit 21, supplies the feature quantity to the spectral characteristic determination unit 25, and Is supplied to the expansion coefficient calculation unit 26.
  • a feature quantity hereinafter also referred to as a high frequency spectrum feature quantity
  • the spectrum obtained by the MDCT unit 21 is, for example, an MDCT spectrum obtained by MDCT, and the MDCT spectrum has a property different from a DFT spectrum obtained by DFT (DiscretecreFourier Transform).
  • the MDCT spectrum is also called an MDCT coefficient.
  • the DFT spectrum includes an amplitude component and a phase component independently of each other.
  • the value of the MDCT spectrum that is, the value in each frequency bin of the MDCT spectrum is a value in which both the amplitude component and the phase component are woven. That is, the value of the MDCT spectrum is determined by the amplitude component and the phase component of the input signal, and the value of only one of the amplitude component and the phase component cannot be known from the value of the MDCT spectrum.
  • IMDCT which is the inverse transform of MDCT
  • DFT is performed on the time-series signal for feature quantity extraction.
  • the encoding apparatus 11 based on the MDCT spectrum to calculate the pseudo amplitude spectrum S k by the following equation (1), used for feature extraction.
  • the pseudo amplitude spectrum S k shows the pseudo amplitude spectrum corresponding to the k-th frequency bin of the MDCT spectrum
  • y k is a value of the MDCT spectrum corresponding to the k-th frequency bin Show. Therefore, in Equation (1), the pseudo amplitude spectrum Sk is calculated for one frequency bin based on the value of the MDCT spectrum corresponding to three consecutive frequency bins.
  • the value of the pseudo amplitude spectrum S k obtained in this way is a similar value to the amplitude spectrum. That is, the value of the pseudo amplitude spectrum S k is a value having a strong correlation with the amplitude spectrum of the DFT spectrum, and therefore the value of the pseudo amplitude spectrum S k indicates a pseudo amplitude value at each frequency of the MDCT spectrum. be able to.
  • the pseudo-amplitude spectrum obtained for the low-frequency spectrum is also referred to as a low-frequency pseudo-amplitude spectrum
  • the pseudo-amplitude spectrum obtained for the high-frequency spectrum is particularly referred to as a high-frequency pseudo-amplitude spectrum.
  • Low range characteristic amount extraction section 23 and the high-range feature extraction unit 24 calculates the pseudo amplitude spectrum S k by equation (1), the resulting It calculates a characteristic quantity from the pseudo amplitude spectrum S k for each frequency bin.
  • the low-frequency feature quantity extraction unit 23 and the high-frequency feature quantity extraction unit 24 indicate the high noise characteristics of the spectrum by the calculation of the following equation (2) as the low-frequency spectrum feature value and the high-frequency spectrum feature value.
  • Spectral Flatness hereinafter also referred to as SF serving as an index is calculated.
  • N indicates the number of target spectra, that is, the number of frequency bins.
  • S i represents the value of the pseudo amplitude spectrum of the i-th frequency bin.
  • the arithmetic average of the pseudo amplitude spectrum S k calculated for all frequency bins of the high frequency spectrum, relative to the geometric average of the pseudo amplitude spectrum S k calculated for all frequency bins of the high frequency spectrum is SF.
  • the SF calculated in this way indicates the degree of flatness of the spectrum and takes a value in the range of 0.0 to 1.0.
  • any feature amount may be calculated.
  • an index indicating the high level of noise in the spectrum in other words, an index indicating the high level of tonality, so that it depends on the accuracy of the feature amount required by the encoding device 11 and the allowable calculation amount.
  • another index indicating the high noise property may be calculated as the feature amount.
  • a spectrum concentration degree D shown in the following equation (3) may be calculated as a low-frequency spectrum feature quantity or a high-frequency spectrum feature quantity.
  • N indicates the number of target spectra, that is, the number of frequency bins.
  • S i indicates the value of the pseudo amplitude spectrum corresponding to the i th frequency bin, and Max (S i ) indicates the maximum value of the pseudo amplitude spectrum S i corresponding to each frequency bin.
  • the ratio of the arithmetic average of the pseudo amplitude spectrum S k to the maximum value of the pseudo amplitude spectrum S k is the spectrum concentration degree D.
  • any feature amount may be calculated, but the description will be continued below assuming that SF is calculated as the feature amount.
  • the low-frequency feature quantity extraction unit 23 calculates the low-frequency spectrum feature quantity, as shown in FIG. 3, the low-frequency pseudo-amplitude spectrum calculated for the low-frequency spectrum is high with the upper limit frequency Fb as a boundary.
  • the above-described SF is calculated for the low-frequency aliasing pseudo amplitude spectrum obtained by aliasing to the band side.
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • the low-frequency pseudo-amplitude spectrum represented by the curve C11 is folded back to the high-frequency side at the position of the upper limit frequency Fb to be the low-frequency aliasing pseudo-amplitude spectrum represented by the curve C12. Therefore, the low frequency pseudo amplitude spectrum and the low frequency aliasing pseudo amplitude spectrum are symmetrical waveforms.
  • the low frequency feature quantity extraction unit 23 calculates the frequency bins of the range from the upper limit frequency Fb to the frequency Fc in the low frequency aliasing pseudo-amplitude spectrum obtained by the aliasing by the equation (2).
  • SF is calculated as a low-frequency spectrum feature value by calculation.
  • SF calculated as a low-frequency spectrum feature amount is particularly referred to as SFL.
  • the low frequency feature quantity extraction unit 23 supplies the SFL as the low frequency spectrum feature quantity obtained in this way to the spectrum characteristic determination unit 25, and also uses the low frequency aliasing pseudo amplitude spectrum as the amplitude information as the expansion coefficient calculation unit 26. To supply. At this time, for example, a portion from the upper limit frequency Fb to the frequency Fc in the low frequency aliasing pseudo amplitude spectrum is supplied to the expansion coefficient calculation unit 26.
  • the high frequency feature quantity extraction unit 24 calculates SF as the high frequency spectrum feature quantity by calculating the equation (2) for each frequency bin of the high frequency pseudo amplitude spectrum obtained from the high frequency spectrum.
  • SF calculated as a high-frequency spectrum feature is particularly referred to as SFH.
  • the high frequency feature quantity extraction unit 24 supplies the SFH as the high frequency spectrum feature value obtained in this way to the spectrum characteristic determination unit 25 and also supplies the high frequency pseudo amplitude spectrum to the expansion coefficient calculation unit 26 as amplitude information. Supply.
  • the spectral characteristic determination unit 25 Based on the low-frequency spectrum feature quantity supplied from the low-frequency feature quantity extraction unit 23 and the high-frequency spectrum feature quantity supplied from the high-frequency feature quantity extraction unit 24, the spectral characteristic determination unit 25 A spectral characteristic code indicating the spectral characteristic of the input signal is generated.
  • the spectrum characteristic code is a code indicating high tonality.
  • the input signal MDCT spectrum
  • the value of the spectrum characteristic code indicating high tonality is assumed to be “1”.
  • the spectrum characteristic code is a code indicating that it is not high tonality. That is, the input signal has a spectral characteristic that the tonality is not high, in other words, the noise property is high.
  • the value of the spectrum characteristic code indicating that the tonality is not high is “0”.
  • the spectral characteristic code is “1”, and the noise property of at least one of the low frequency component and the high frequency component of the MDCT spectrum is set. Is high, the spectral characteristic code is set to “0”.
  • the spectrum characteristic determination unit 25 supplies the spectrum characteristic code obtained in this way to the extension coefficient calculation unit 26, the extension coefficient quantization unit 27, and the multiplexing unit 28.
  • the expansion coefficient calculation unit 26 is based on the low-frequency aliasing pseudo-amplitude spectrum from the low-frequency feature quantity extraction unit 23, the high-frequency pseudo-amplitude spectrum from the high-frequency feature quantity extraction unit 24, and the spectrum characteristic code from the spectrum characteristic determination unit 25.
  • the expansion coefficient is calculated and supplied to the expansion coefficient quantization unit 27.
  • the expansion coefficient is information for performing level adjustment of the high frequency in the frequency domain at the time of decoding, and indicates the ratio of the levels of the high frequency pseudo amplitude spectrum and the low frequency aliasing pseudo amplitude spectrum. In other words, the expansion coefficient indicates the ratio between the average amplitude of the high frequency spectrum and the average amplitude of the low frequency spectrum.
  • the expansion coefficient calculation unit 26 is a high frequency range, that is, the value of the high frequency pseudo amplitude spectrum of each frequency bin in the band from the upper limit frequency Fb to the frequency Fc. The average value of is calculated. Further, the expansion coefficient calculation unit 26 calculates the average value of the low-frequency aliasing pseudo-amplitude spectrum of each frequency bin in the band from the upper limit frequency Fb to the frequency Fc, and calculates the average value of the high-frequency pseudo-amplitude spectrum as the low frequency A value obtained by dividing by the average value of the aliasing pseudo amplitude spectrum is defined as an expansion coefficient. In this case, one expansion coefficient is obtained for the entire high band, that is, the entire expansion band.
  • the expansion coefficient calculation unit 26 divides from the low frequency side to the high frequency side in consideration of human auditory characteristics, for example, as shown in FIG.
  • the high band is divided into a plurality of bands so that the obtained bandwidth becomes wide, and the expansion coefficient is calculated for each band.
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • the frequency band of the high band spectrum that is, the frequency band from the upper limit frequency Fb to the frequency Fc, which is the high band, is divided into five bands, band B1 to band B5.
  • the width of each band obtained by the division is wider as the band is on the frequency Fc side.
  • the expansion coefficient calculation unit 26 obtains the average value of the high-frequency pseudo-amplitude spectrum value by dividing the average value of the low-frequency aliasing pseudo-amplitude spectrum value for each of the bands B1 to B5 constituting the high frequency band. The obtained value is calculated, and the obtained value is set as the expansion coefficient of each band.
  • the value obtained by dividing the average value of the high frequency pseudo amplitude spectrum in each frequency bin in the band B1 by the average value of the low frequency aliasing pseudo amplitude spectrum in each frequency bin in the band B1 is the band B1.
  • the expansion coefficient C i of the i-th band (region) obtained by dividing the high region is calculated by the following equation (4).
  • Equation (4) S k represents the value of the high frequency pseudo-amplitude spectrum of the k th frequency bin in the i th band, and L k represents the k th frequency bin in the i th band.
  • the value of the low-frequency aliasing pseudo amplitude spectrum is shown.
  • M represents the number of spectra in the i-th band, that is, the number of frequency bins.
  • the expansion coefficient quantization unit 27 quantizes the expansion coefficient supplied from the expansion coefficient calculation unit 26 based on the spectral characteristic code supplied from the spectral characteristic determination unit 25, and multiplexes the quantized expansion coefficient obtained as a result. To the conversion unit 28.
  • the multiplexing unit 28 multiplexes the quantized low frequency spectrum from the spectrum quantizing unit 22, the spectrum characteristic code from the spectrum characteristic determining unit 25, and the quantized expansion coefficient from the expansion coefficient quantizing unit 27, and obtains the result.
  • the encoded code string is output.
  • the multiplexing unit 28 performs entropy encoding on the quantized low frequency spectrum and also encodes the quantization extension coefficient.
  • the encoding device 11 starts an encoding process and encodes the input signal.
  • the encoding process performed by the encoding device 11 will be described with reference to the flowchart of FIG.
  • step S11 the MDCT unit 21 performs MDCT on the supplied input signal. Then, the MDCT unit 21 supplies the low-frequency part of the MDCT spectrum obtained by MDCT as a low-frequency spectrum to the spectrum quantization unit 22 and the low-frequency feature quantity extracting unit 23, and the high-frequency part of the MDCT spectrum. Is supplied to the high frequency feature quantity extraction unit 24 as a high frequency spectrum.
  • step S12 the spectrum quantization unit 22 quantizes the low frequency spectrum supplied from the MDCT unit 21, and supplies the quantized low frequency spectrum obtained as a result to the multiplexing unit 28.
  • step S13 the low frequency feature amount extraction unit 23 extracts a low frequency spectrum feature value from the low frequency spectrum supplied from the MDCT unit 21.
  • the low frequency feature quantity extraction unit 23 calculates the above-described equation (1) for each frequency bin of the low frequency spectrum, and calculates a low frequency pseudo amplitude spectrum.
  • the low frequency feature quantity extraction unit 23 folds the obtained low frequency pseudo amplitude spectrum to the high frequency side at the upper limit frequency Fb to obtain a low frequency aliasing pseudo amplitude spectrum. At this time, for example, the low-frequency feature quantity extraction unit 23 generates a low-frequency aliasing pseudo-amplitude spectrum by discarding a portion having a frequency higher than the frequency Fc of the aliasing low-frequency pseudo-amplitude spectrum.
  • the low frequency feature quantity extraction unit 23 calculates the above-described formula (2) for each frequency bin of the low frequency aliasing pseudo amplitude spectrum, and calculates SFL as the low frequency spectrum feature quantity.
  • the low frequency feature quantity extraction unit 23 supplies the SFL as the calculated low frequency spectrum feature value to the spectrum characteristic determination unit 25 and supplies the low frequency aliasing pseudo amplitude spectrum to the expansion coefficient calculation unit 26.
  • step S14 the high frequency feature quantity extraction unit 24 extracts a high frequency spectrum feature value from the high frequency spectrum supplied from the MDCT unit 21.
  • the high frequency feature quantity extraction unit 24 calculates the above-described equation (1) for each frequency bin of the high frequency spectrum to calculate the high frequency pseudo amplitude spectrum, and also calculates the formula ( The calculation of 2) is performed, and SFH is calculated as a high-frequency spectrum feature amount.
  • the high frequency feature quantity extraction unit 24 supplies SFH as the calculated high frequency spectrum feature value to the spectrum characteristic determination unit 25 and also supplies the high frequency pseudo amplitude spectrum to the expansion coefficient calculation unit 26.
  • step S15 the spectrum characteristic determination unit 25, based on the low frequency spectrum feature quantity supplied from the low frequency feature quantity extraction unit 23 and the high frequency spectrum feature quantity supplied from the high frequency feature quantity extraction unit 24, A spectral characteristic code indicating the spectral characteristic is generated.
  • the spectrum characteristic determination unit 25 selects a spectrum characteristic code having a value of “1” when both the SFL that is the low-frequency spectrum feature quantity and the SFH that is the high-frequency spectrum feature quantity are less than the threshold. Generate.
  • the spectrum characteristic determination unit 25 has a value of “0”. A certain spectral characteristic code is generated.
  • the spectrum characteristic determination unit 25 supplies the generated spectrum characteristic code to the extension coefficient calculation unit 26, the extension coefficient quantization unit 27, and the multiplexing unit 28.
  • step S ⁇ b> 16 the expansion coefficient calculation unit 26 and the expansion coefficient quantization unit 27 determine whether or not the spectral characteristic indicates high tonality based on the spectral characteristic code supplied from the spectral characteristic determination unit 25. .
  • step S16 If it is determined in step S16 that the tonality is high, the process proceeds to step S17.
  • step S ⁇ b> 17 the expansion coefficient calculation unit 26 applies to the entire high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extraction unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extraction unit 24. Thus, a single (one) expansion coefficient is calculated and supplied to the expansion coefficient quantization unit 27.
  • the expansion coefficient calculation unit 26 calculates the average value of the high frequency pseudo amplitude spectrum in each frequency bin and the average value of the low frequency aliasing pseudo amplitude spectrum in each frequency bin for the band from the upper limit frequency Fb to the frequency Fc. Divide by value to calculate expansion factor.
  • step S16 determines whether it indicates high tonality. If it is determined in step S16 that it does not indicate high tonality, the process proceeds to step S18.
  • step S18 the expansion coefficient calculation unit 26 divides the high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extracting unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extracting unit 24. An expansion coefficient is calculated for each band, and is supplied to the expansion coefficient quantization unit 27.
  • the expansion coefficient calculation unit 26 divides the entire high frequency band into five bands B1 to B5 as shown in FIG. 4 and performs the calculation of the above-described formula (4) for each band, The expansion coefficient is calculated for each. In this case, one extension coefficient is calculated for each of the bands B1 to B5.
  • step S19 the expansion coefficient quantization unit 27 quantizes the expansion coefficient supplied from the expansion coefficient calculation unit 26, and the quantized expansion coefficient obtained as a result thereof. Is supplied to the multiplexing unit 28.
  • step S20 the multiplexing unit 28 multiplexes the quantized low frequency spectrum from the spectrum quantizing unit 22, the spectrum characteristic code from the spectrum characteristic determining unit 25, and the quantized expansion coefficient from the expansion coefficient quantizing unit 27. Generate a code string. At this time, the multiplexing unit 28 encodes the quantized low frequency spectrum and the quantized extension coefficient, and then multiplexes the encoded quantized low frequency spectrum and the quantized extended coefficient, and the spectrum characteristic code.
  • the multiplexing unit 28 outputs the code string obtained by multiplexing, and the encoding process ends.
  • the encoding device 11 determines the spectral characteristics of the input signal based on the low-frequency spectrum feature value and the high-frequency spectrum feature value. And the encoding apparatus 11 calculates a different expansion coefficient according to a spectrum characteristic as an expansion coefficient for adjusting the level of a high region in a frequency domain at the time of decoding.
  • the level adjustment of the high frequency can be performed in the frequency domain, the time delay due to the band expansion at the time of decoding is reduced, and an increase in resources on the decoding side is also suppressed.
  • high-frequency level adjustment can be performed according to the spectral characteristics, so it is possible to suppress deterioration of the audible sound quality even with high tonality signals or low tonality signals, and to obtain higher-quality sound. Will be able to.
  • FIG. 6 is a diagram illustrating a configuration example of an embodiment of a decoding device to which the present technology is applied.
  • a decomposition unit 91 includes a decomposition unit 91, a spectrum inverse quantization unit 92, an extended coefficient inverse quantization unit 93, an extended spectrum generation unit 94, and an IMDCT unit 95.
  • the code sequence output from the multiplexing unit 28 of the encoding device 11 is supplied to the decomposition unit 91.
  • the decomposition unit 91 decomposes the supplied code string and obtains a quantized low frequency spectrum, a spectrum characteristic code, and a quantization extension coefficient from the code string.
  • the decomposition unit 91 also decodes the quantized low frequency spectrum and the quantized expansion coefficient.
  • the decomposing unit 91 supplies the quantized low-frequency spectrum obtained from the code string to the spectrum inverse quantizing unit 92, and the spectrum characteristic code obtained from the code string is expanded to an extension coefficient inverse quantizing unit 93 and an expanded spectrum generating unit. 94. In addition, the decomposition unit 91 supplies the quantization extension coefficient obtained from the code string to the extension coefficient inverse quantization unit 93.
  • the spectrum inverse quantization unit 92 inversely quantizes the quantized low frequency spectrum supplied from the decomposition unit 91 and supplies the obtained low frequency spectrum to the extended spectrum generation unit 94 and the IMDCT unit 95.
  • the expansion coefficient inverse quantization unit 93 dequantizes the quantized expansion coefficient supplied from the decomposition unit 91 based on the spectrum characteristic code supplied from the decomposition unit 91, and converts the obtained expansion coefficient into the extended spectrum generation unit 94. To supply.
  • the extended spectrum generation unit 94 uses the extended coefficient supplied from the extended coefficient inverse quantization unit 93 and the low frequency spectrum supplied from the spectrum inverse quantization unit 92. An extended spectrum is generated and supplied to the IMDCT unit 95.
  • the IMDCT unit 95 sets the low frequency spectrum supplied from the spectrum inverse quantization unit 92 as a low frequency spectrum and the extended spectrum supplied from the extended spectrum generation unit 94 as a high frequency (extended band) spectrum. Combine (synthesize) band spectrum and extended spectrum. Further, the IMDCT unit 95 performs IMDCT orthogonal transformation on the spectrum obtained by combining the low-frequency spectrum and the extended spectrum, and outputs the resulting time-series signal as a speech signal obtained by decoding. To do.
  • the decoding device 81 When the code sequence is supplied, the decoding device 81 starts a decoding process, decodes the code sequence, and outputs an audio signal.
  • the decoding processing by the decoding device 81 will be described with reference to the flowchart of FIG.
  • step S51 the decomposition unit 91 decomposes the supplied code string, and obtains a quantized low frequency spectrum, a spectrum characteristic code, and a quantization extension coefficient from the code string.
  • the decomposition unit 91 supplies the obtained quantized low-frequency spectrum to the spectrum inverse quantization unit 92, supplies the spectrum characteristic code to the extension coefficient inverse quantization unit 93 and the extension spectrum generation unit 94, and performs quantization extension.
  • the coefficient is supplied to the extended coefficient inverse quantization unit 93.
  • the decomposing unit 91 decodes the quantized low frequency spectrum and the quantized extension coefficient, and the decoded quantized low frequency spectrum and the quantized extended coefficient are converted into the spectrum inverse quantizing unit 92 and the extended The coefficient is supplied to the coefficient inverse quantization unit 93.
  • step S52 the spectrum inverse quantization unit 92 inversely quantizes the quantized low frequency spectrum supplied from the decomposition unit 91, and supplies the obtained low frequency spectrum to the extended spectrum generation unit 94 and the IMDCT unit 95.
  • step S53 the extended coefficient inverse quantization unit 93 and the extended spectrum generation unit 94 determine whether or not the spectral characteristic shows high tonality based on the spectral characteristic code supplied from the decomposition unit 91.
  • the code string includes a quantization extension coefficient for obtaining one (single) extension coefficient calculated for the entire high band
  • the expansion coefficient inverse quantization is performed from the decomposition unit 91.
  • the unit 93 is supplied with one quantization expansion coefficient.
  • the decomposition unit 91 to the extension coefficient inverse quantization unit 93 are supplied with quantization expansion coefficients corresponding to the number of high-frequency divided bands.
  • step S54 the expansion coefficient inverse quantization unit 93 inversely quantizes the single quantization expansion coefficient supplied from the decomposition unit 91.
  • the extended coefficient is supplied to the extended spectrum generation unit 94.
  • step S55 the extended spectrum generation unit 94 generates an extended spectrum based on the single extension coefficient supplied from the extension coefficient inverse quantization unit 93 and the low frequency spectrum supplied from the spectrum inverse quantization unit 92. , Supplied to the IMDCT section 95.
  • the extended spectrum generation unit 94 folds the low band spectrum to the high band side with the upper limit frequency Fb as the boundary in the same manner as the example described with reference to FIG. , A seed spectrum for obtaining an extended spectrum.
  • the extended spectrum generation unit 94 multiplies the entire obtained seed spectrum, that is, the value of the seed spectrum in each frequency bin by a single extension coefficient to obtain an extended spectrum. That is, the level of the seed spectrum is adjusted to the level of the original high-frequency spectrum before encoding by the extension coefficient, and is set as the extended spectrum.
  • the extended spectrum obtained in this way is the high-frequency spectrum of the original input signal estimated from the low-frequency spectrum obtained by decoding and the expansion coefficient.
  • step S53 determines whether the spectral characteristic does not indicate high tonality, that is, indicates high noise. If it is determined in step S53 that the spectral characteristic does not indicate high tonality, that is, indicates high noise, the process proceeds to step S56.
  • step S56 the extension coefficient inverse quantization unit 93 inversely quantizes the quantization extension coefficient for each of a plurality of bands constituting the high frequency supplied from the decomposition unit 91, and uses the obtained extension coefficient as the extension spectrum generation unit 94. To supply. Thereby, for example, the expansion coefficient of each band (area) of the bands B1 to B5 shown in FIG. 4 is obtained.
  • step S57 the extended spectrum generation unit 94 generates an extended spectrum based on the extension coefficient of each band supplied from the extension coefficient inverse quantization unit 93 and the low frequency spectrum supplied from the spectrum inverse quantization unit 92. , Supplied to the IMDCT section 95.
  • the extended spectrum generation unit 94 generates a seed spectrum by performing the same process as in step S55, and for each band (region) of the obtained seed spectrum, the extension coefficient of those bands To obtain an extended spectrum.
  • the high band is divided into five bands B1 to B5 as shown in FIG. 4, the value of the seed spectrum in the band B1 portion of the seed spectrum, more specifically, each frequency bin in the band B1. Is multiplied by the expansion coefficient of band B1 to generate the band B1 portion of the extended spectrum. Similarly, for the other bands B2 to B5, those bands of the seed spectrum are multiplied by the extension coefficient of each band, and each band portion of the extended spectrum is generated.
  • step S55 and step S57 the example in which the low-frequency spectrum is turned back to the high-frequency side is used as the seed spectrum.
  • the seed spectrum may be generated in any way. For example, a spectrum obtained by duplicating (copying) a part of a part of the frequency band of the low-frequency spectrum and pasting it on the high frequency may be used as the seed spectrum.
  • step S58 the IMDCT unit 95 is based on the low frequency spectrum supplied from the spectrum inverse quantization unit 92 and the extended spectrum supplied from the extended spectrum generation unit 94. To generate a time series signal.
  • the IMDCT unit 95 combines (synthesizes) the low-frequency spectrum and the extended spectrum to generate a spectrum having all the band components of the low-frequency and high-frequency (extended bandwidth). IMDCT is performed to obtain a time series signal. Thereby, a time series signal to which a high frequency component is added by band expansion is obtained.
  • the IMDCT section 95 outputs the time-series signal obtained in this way as an audio signal obtained by decoding, and the decoding process ends.
  • the decoding device 81 obtains the extension coefficient corresponding to the spectrum characteristic by decoding and inverse quantization, the obtained extension coefficient, and the seed spectrum obtained by folding the low-frequency spectrum to the high-frequency side. Generate an extended spectrum from
  • the level of the seed spectrum that is a high frequency component is adjusted, and by setting the expanded spectrum, the high frequency level can be adjusted in the frequency domain, High frequency level adjustment according to the spectrum characteristics can be realized.
  • the extended spectrum generation unit 94 distinguishes whether the original signal before encoding is a signal with high tonality or a normal signal with high noise characteristics based on the spectrum characteristic code. Is generated.
  • a signal having a high tonality and a normal signal having a high noise characteristic have different spectral outlines.
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • a curve C21 represents a high noise signal, that is, a normal signal spectrum
  • a curve C22 represents a high tonal signal spectrum
  • the highly noisy signal represented by curve C21 has no protruding portion in the entire frequency band, and the spectrum waveform has a gentle mountain shape. In other words, a signal with high noise characteristics does not have a portion where energy is concentrated.
  • the signal with high tonality represented by the curve C22 has energy concentrated on a specific frequency, and the waveform of the portion is like a sharply pointed mountain. That is, the waveform of the spectrum of the signal with high tonality is not a gentle waveform because the frequency portion where energy is concentrated protrudes.
  • the low-frequency spectrum such as the low-frequency spectrum folded at the upper limit frequency Fb, or the low-frequency spectrum partially copied and pasted at the high frequency.
  • the spectrum is used as the seed spectrum. Then, this kind of spectrum is level-adjusted by the expansion coefficient, that is, the amplitude is adjusted to obtain an expanded spectrum.
  • the phase relationship between adjacent ones in each spectrum is not so important for hearing, and the amplitude level is important. Therefore, when adjusting the level of the seed spectrum, it is desirable to perform level adjustment in fine units so that the level (amplitude) of the seed spectrum is as close as possible to the level of the high-frequency spectrum of the original signal before encoding.
  • the high frequency band is divided into four bands at the time of encoding, and the expansion coefficient is calculated for each band.
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • the frequency band of the high-frequency spectrum that is, the frequency band from the upper limit frequency Fb to the frequency Fc, which is a high frequency
  • the frequency band of the high-frequency spectrum is divided into four bands (areas) B11 to B14.
  • the width of each band obtained by the division is wider as the band is on the frequency Fc side.
  • each of the straight lines L11 to L14 represents the average value of the high frequency pseudo-amplitude spectrum in each of the bands B11 to B14, that is, the average amplitude of the high frequency spectrum.
  • a value obtained by dividing the average value of the high frequency pseudo amplitude spectrum obtained for each band by the average value of the low frequency aliasing pseudo amplitude spectrum of the same band is stored in the code string as an extension coefficient, and is decoded. 81.
  • the level of the seed spectrum obtained from the low frequency spectrum is adjusted by the extension coefficient.
  • the vertical axis indicates the spectrum value, that is, the level
  • the horizontal axis indicates the frequency.
  • the same reference numerals are given to the portions corresponding to those in FIG. 9, and the description thereof will be omitted as appropriate.
  • a curve C31 represents a low-frequency spectrum obtained by decoding a code string
  • a curve C32 represents a seed spectrum obtained from the low-frequency spectrum.
  • the low-frequency spectrum represented by the curve C31 is folded back to the high-frequency side at the upper limit frequency Fb to obtain the seed spectrum represented by the curve C32.
  • Each of the bands B11 to B14 of such a seed spectrum is multiplied by each of the expansion coefficients calculated for each band.
  • the level of the seed spectrum is adjusted so that the average amplitude of each band of the seed spectrum, more specifically, the average amplitude of each band approaches the average amplitude of the high-frequency spectrum of the original signal as indicated by an arrow in the figure. Adjustment is made in each of the bands B11 to B14.
  • the low-frequency spectrum is a signal with high tonality
  • the seed spectrum is multiplied by a different expansion coefficient for each band
  • the level of each band of the expanded spectrum that is, the average amplitude
  • the phase relationship of the spectrum is greatly broken in each band.
  • the tonality of the extended spectrum is impaired as shown in FIG.
  • the vertical axis indicates the spectrum value, that is, the level
  • the horizontal axis indicates the frequency
  • curve C41 represents the MDCT spectrum of the input signal to be encoded
  • curve C42 combines the low-frequency spectrum and extended spectrum generated when decoding the input signal to be encoded. Represents the spectrum obtained. Therefore, in this example, in the spectrum represented by the curve C42, a portion from the frequency Dc to the upper limit frequency Fb is a low-frequency spectrum, and a portion from the upper limit frequency Fb to the frequency Fc is an extended spectrum.
  • the original input signal is a signal with high tonality in both low and high frequencies.
  • the level of the seed spectrum is adjusted with a different expansion coefficient for each high-frequency band, the phase relationship of the spectrum is significantly disrupted, as shown by the curve C42, and the tonality of the extended band is impaired. It will be.
  • the waveform of the high frequency region that is, the extended spectrum is broken, and the tonality that the original MDCT spectrum had has been damaged.
  • the waveform tends to collapse at the boundary between the high-frequency divided bands, and the tonality is easily lost.
  • the seed spectrum obtained by folding the low-frequency spectrum maintains the phase relationship of the spectrum in the state as it is, that is, before the level adjustment by the expansion coefficient, and thus the tonality is also maintained.
  • the level (amplitude) of the seed spectrum is adjusted, the amplitude level of the high frequency spectrum of the original input signal cannot be reflected in the extended spectrum.
  • the volume of the high frequency band, that is, the expansion band is different from that of the original high frequency band, so that appropriate band expansion cannot be realized. In other words, it becomes impossible to obtain higher quality sound.
  • both the retention of the tonality in the extended spectrum and the reflection of the amplitude level are realized by adjusting the level of the seed spectrum in the minimum unit for a signal with high tonality.
  • the expansion coefficient calculation unit 26 divides the average value of the high frequency pseudo amplitude spectrum in the entire high frequency band (extension frequency band) by the average value of the low frequency aliasing pseudo amplitude spectrum in the entire high frequency band. Then, a single expansion coefficient is calculated for the expansion band.
  • the extended spectrum generation unit 94 multiplies the entire seed spectrum by a single extension coefficient to obtain an extended spectrum. That is, the level of the seed spectrum is adjusted by using the entire extended band (high band) as a unit to obtain an extended spectrum.
  • the overall amplitude level of the high frequency range of the extended spectrum is also the high frequency of the original input signal. It can be close to the amplitude level.
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • curves C51 to C53 represent the MDCT spectrum of the original input signal, the low-frequency spectrum obtained by inverse quantization at the time of decoding, and the seed spectrum, respectively.
  • the MDCT spectrum represented by the curve C51 has a low-frequency part and a high-frequency part, that is, a low-frequency spectrum and a high-frequency spectrum, each of which has energy concentrated at a specific frequency.
  • the signal is high.
  • the average amplitude of the low frequency spectrum is larger than the average amplitude of the high frequency spectrum.
  • a straight line L21 represents the average value of the high-frequency pseudo-amplitude spectrum in the high frequency band (extended band), that is, the average amplitude of the high frequency spectrum.
  • the low-frequency spectrum represented by the curve C52 is folded back to be a seed spectrum represented by the curve C53, and the level of the seed spectrum is adjusted by the expansion coefficient as represented by the arrow in the figure. And an extended spectrum.
  • the average amplitude of the entire high frequency range of the extended spectrum is made closer to the average value of the high frequency pseudo amplitude spectrum represented by the straight line L21 by a single expansion coefficient.
  • the extension coefficient is single, the amount of additional information necessary for band extension stored in the code string output from the encoding device 11 can be reduced, so that the low-frequency spectrum is correspondingly reduced. It is possible to allocate the amount of information to the quantization of the sound, and to improve the overall sound quality.
  • the frequency is not high, for example, as shown in FIG. 13, there is an input signal having a spectral characteristic that the tonality of the low-frequency spectrum is high and the tonality of the high-frequency spectrum is low.
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • curve C61 represents the MDCT spectrum of the input signal to be encoded.
  • a portion from the frequency Dc to the upper limit frequency Fb is a low frequency spectrum
  • a portion from the upper limit frequency Fb to the frequency Fc is a high frequency spectrum.
  • the low frequency spectrum there is a portion where energy is concentrated at a specific frequency, which is a signal with high tonality.
  • the high frequency spectrum does not have a portion where energy is concentrated at a specific frequency and is a signal with low tonality, that is, a signal with high noise.
  • a curve C71 represents a low-frequency spectrum obtained by inverse quantization of the quantized low-frequency spectrum
  • a curve C72 represents an extended spectrum
  • the high-frequency spectrum of the original time-series signal has a low tonality, but the low-frequency spectrum has a high tonality, so the extended spectrum obtained by folding the low-frequency spectrum and adjusting the level using the expansion coefficient Has a high tonality. That is, a characteristic different from the characteristic of the original signal appears in the high band due to the band expansion.
  • the time series signal (audio signal) obtained by the decoding process may cause a sense of incongruity, for example, metal sounds may be mixed. turn into.
  • the extended spectrum is obtained using random noise as shown in FIG. 15, for example, without using the low-frequency spectrum aliasing as a seed spectrum. You may make it produce
  • the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
  • curves C81 to C83 represent the MDCT spectrum, the low-frequency spectrum obtained by dequantizing the quantized low-frequency spectrum, and the extended spectrum, respectively.
  • the high region of the MDCT spectrum is divided into three bands, band B31 to band B33, and the higher the frequency, the wider the bandwidth.
  • an envelope coefficient is calculated as envelope information indicating the envelope of the band for each band during encoding.
  • the envelope coefficient is an average value of the high-frequency pseudo-amplitude spectrum of each frequency bin in the calculation target band.
  • each of the straight lines L31 to L33 indicates the envelope coefficient calculated for each of the bands B31 to B33.
  • the envelope coefficient is expansion coefficient information for adjusting the level of random noise as a noise signal when generating an extended spectrum.
  • the envelope coefficient is distinguished from the expansion coefficient calculated from the low-frequency aliasing pseudo-amplitude spectrum and the high-frequency pseudo-amplitude spectrum. Therefore, it will be referred to as an envelope coefficient.
  • the number of high frequency divisions at the time of calculating the envelope coefficient may be the same as or different from the number of high frequency divisions at the time of calculating the expansion coefficient.
  • the envelope coefficient is quantized and encoded, and multiplexed with the quantized low frequency spectrum and spectrum characteristic code to generate a code string.
  • an extended spectrum is generated using the envelope coefficient acquired from the code string and random noise.
  • a random number normalized to a value in the range of -1.0 to 1.0 is generated for each frequency bin of the bands B31 to B33 which are the extension bands, and a noise signal composed of a random number for each frequency bin is generated. Random noise. Then, the random spectrum is multiplied by an envelope coefficient to obtain an extended spectrum.
  • the extended spectrum obtained in this way is generated from random noise obtained by normalizing random numbers, as shown in the curve C83, the energy is not concentrated on a specific frequency and a spectrum with high noise characteristics is obtained. It has become.
  • the extended spectrum is obtained by adjusting the level of random noise using the envelope coefficient, the envelope is close to the high-frequency envelope of the original MDCT spectrum.
  • the time-series signal obtained by decoding has a high low-frequency spectrum tonality and a low high-frequency spectrum tonality, similar to the encoded original input signal.
  • step S91 to step S94 is the same as the processing from step S11 to step S14 in FIG.
  • step S95 the spectrum characteristic determination unit 25, based on the low frequency spectrum feature quantity supplied from the low frequency feature quantity extraction unit 23 and the high frequency spectrum feature quantity supplied from the high frequency feature quantity extraction unit 24, A spectral characteristic code indicating the spectral characteristic is generated.
  • the spectrum characteristic determination unit 25 selects a spectrum characteristic code having a value of “1” when both the SFL that is the low-frequency spectrum feature quantity and the SFH that is the high-frequency spectrum feature quantity are less than the threshold. Generate.
  • the spectrum characteristic code “1” indicates that both low and high frequencies of the input signal (MDCT spectrum) have high tonality as a spectrum characteristic.
  • the spectrum characteristic determining unit 25 generates a spectrum characteristic code having a value of “2” when the SFL that is the low-frequency spectrum feature quantity is less than the threshold value and the SFH that is the high-frequency spectrum feature quantity is equal to or greater than the threshold value.
  • the spectral characteristic code “2” has a high tonality in the low frequency (low frequency spectrum) of the input signal, and a low tonality, that is, a high noise property in the high frequency (high frequency spectrum) of the input signal. It is shown that.
  • the spectrum characteristic determination unit 25 generates a spectrum characteristic code having a value of “0” when the SFL that is the low-frequency spectrum feature amount is equal to or greater than the threshold value.
  • the spectral characteristic code “0” indicates that the input signal has low tonality as a spectral characteristic.
  • the spectrum characteristic determination unit 25 supplies the generated spectrum characteristic code to the extension coefficient calculation unit 26, the extension coefficient quantization unit 27, and the multiplexing unit 28.
  • step S96 the expansion coefficient calculation unit 26 and the expansion coefficient quantization unit 27 indicate tonalities in which both low-frequency and high-frequency spectral characteristics are high, based on the spectral characteristic code supplied from the spectral characteristic determination unit 25. It is determined whether or not there is.
  • the value of the spectrum characteristic code is “1”, it is determined that the low-frequency and high-frequency spectral characteristics indicate high tonality.
  • step S96 If it is determined in step S96 that the low-frequency and high-frequency spectral characteristics indicate high tonality, the process proceeds to step S97.
  • step S97 the expansion coefficient calculation unit 26 applies to the entire high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extracting unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extracting unit 24. Thus, a single expansion coefficient is calculated and supplied to the expansion coefficient quantization unit 27.
  • step S97 processing similar to that in step S17 in FIG. 5 is performed.
  • the expansion coefficient is calculated in step S97, the process proceeds to step S101.
  • step S96 If it is determined in step S96 that the low-frequency and high-frequency spectral characteristics do not indicate high tonality, the process proceeds to step S98.
  • step S98 based on the spectrum characteristic code, the extension coefficient calculation unit 26 and the extension coefficient quantization unit 27 indicate tonalities with high spectral characteristics in the low band and low tonal characteristics with high spectral characteristics. Determine whether or not.
  • the value of the spectrum characteristic code is “2”, it is determined that the low band spectral characteristic indicates high tonality and the high band spectral characteristic indicates low tonality.
  • step S98 If it is determined in step S98 that the low frequency spectrum characteristic indicates high tonality and the high frequency spectral characteristic indicates low tonality, the process proceeds to step S99.
  • step S99 the expansion coefficient calculation unit 26 calculates an envelope coefficient for each divided high frequency band based on the high frequency pseudo-amplitude spectrum from the high frequency characteristic amount extraction unit 24, and the expansion coefficient quantization unit 27 To supply.
  • the expansion coefficient calculation unit 26 divides the entire high frequency band into three bands B31 to B33 as shown in FIG. 15, and calculates the average value of the high frequency pseudo amplitude spectrum of the frequency bins in each band. It is calculated as an envelope coefficient of those bands.
  • step S101 After the envelope coefficient is calculated, the process proceeds to step S101.
  • step S98 determines whether the low band spectral characteristic indicates high tonality and the high band spectral characteristic indicates low tonality. If it is not determined in step S98 that the low band spectral characteristic indicates high tonality and the high band spectral characteristic indicates low tonality, the process proceeds to step S100.
  • step S100 the expansion coefficient calculation unit 26 divides the high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extraction unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extraction unit 24. An expansion coefficient is calculated for each band, and is supplied to the expansion coefficient quantization unit 27.
  • step S100 the same process as in step S18 of FIG. 5 is performed.
  • the expansion coefficient is calculated in step S100, the process proceeds to step S101.
  • the expansion coefficient quantization unit 27 supplies the expansion coefficient or envelope supplied from the expansion coefficient calculation unit 26. Quantize the coefficients.
  • the expansion coefficient quantization unit 27 when the expansion coefficient quantization unit 27 performs the process of step S97 or step S100 and is supplied with the expansion coefficient, the expansion coefficient quantization unit 27 quantizes the expansion coefficient, and the quantized expansion coefficient obtained as a result is sent to the multiplexing unit 28. Supply. Further, when the process of step S99 is performed and the envelope coefficient is supplied, the extended coefficient quantization unit 27 quantizes the envelope coefficient and supplies the resulting quantized envelope coefficient to the multiplexing unit 28. At this time, for example, scalar quantization or vector quantization is performed on the expansion coefficient or the envelope coefficient.
  • step S ⁇ b> 102 the multiplexing unit 28 quantizes the low frequency spectrum from the spectrum quantization unit 22, the spectrum characteristic code from the spectrum characteristic determination unit 25, and the quantization extension coefficient or quantization from the extension coefficient quantization unit 27.
  • An envelope coefficient is multiplexed to generate a code string.
  • the multiplexing unit 28 performs multiplexing after encoding the quantized low-frequency spectrum and the quantization extension coefficient or the quantization envelope coefficient.
  • the multiplexing unit 28 outputs the code string obtained by multiplexing, and the encoding process ends.
  • the encoding device 11 determines the spectral characteristics of the input signal based on the low-frequency spectrum feature value and the high-frequency spectrum feature value. Then, the encoding device 11 calculates an expansion coefficient or an envelope coefficient as information for obtaining an extended spectrum at the time of decoding according to the spectrum characteristics.
  • step S141 either the quantization expansion coefficient or the quantization envelope coefficient obtained by decomposing the code string is supplied from the decomposition unit 91 to the expansion coefficient inverse quantization unit 93.
  • step S143 the extended coefficient inverse quantization unit 93 and the extended spectrum generation unit 94 indicate the tonality in which the low-frequency and high-frequency spectral characteristics are high based on the spectral characteristic code supplied from the decomposing unit 91. Determine whether or not.
  • the quantization extension coefficient is supplied from the decomposition unit 91 to the extension coefficient inverse quantization unit 93.
  • step S143 If it is determined in step S143 that the low-frequency and high-frequency spectral characteristics indicate high tonality, the processing in steps S144 and S145 is performed to generate an extended spectrum and supplied to the IMDCT unit 95.
  • step S144 and step S145 is the same as the process of step S54 of FIG. 7, and step S55, the description is abbreviate
  • step S143 If it is not determined in step S143 that the low-frequency and high-frequency spectral characteristics indicate high tonality, the process proceeds to step S146.
  • step S146 based on the spectral characteristic code, the extended coefficient inverse quantization unit 93 and the extended spectral generation unit 94 indicate tonalities with high spectral characteristics in the low band and low tonal characteristics with high spectral characteristics. Determine whether or not. For example, when the value of the spectrum characteristic code is “2”, it is determined that the low band spectral characteristic indicates a high tonality and the high band spectral characteristic indicates a low tonality.
  • step S146 If it is determined in step S146 that the low band spectral characteristic indicates a high tonality and the high band spectral characteristic indicates a low tonality, the process proceeds to step S147.
  • the quantization envelope coefficient for each high frequency band is supplied from the decomposition unit 91 to the expansion coefficient inverse quantization unit 93.
  • step S147 the expansion coefficient inverse quantization unit 93 inversely quantizes the quantization envelope coefficient for each of a plurality of bands constituting the high frequency band supplied from the decomposition unit 91, and converts the obtained envelope coefficient into an expansion spectrum generation unit. 94. Thereby, for example, the envelope coefficients L31 to L33 of the bands B31 to B33 shown in FIG. 15 are obtained.
  • step S148 the extended spectrum generation unit 94 generates an extended spectrum based on the envelope coefficient of each band supplied from the extension coefficient inverse quantization unit 93, and supplies the generated spectrum to the IMDCT unit 95.
  • the extended spectrum generation unit 94 assigns a random number normalized to a value in the range of ⁇ 1.0 to 1.0 to each frequency bin of the extension band to generate random noise, and the frequency of each band of the random noise The value in the bin is multiplied by the envelope coefficient of each band to obtain an extended spectrum.
  • step S151 After the extended spectrum is generated, the process proceeds to step S151.
  • step S146 if it is not determined in step S146 that the low-frequency spectral characteristics indicate high tonality and the high-frequency spectral characteristics indicate low tonality, processing in steps S149 and S150 is performed.
  • the quantization expansion coefficient for each high frequency band is supplied from the decomposition unit 91 to the expansion coefficient inverse quantization unit 93 to perform inverse quantization, and the expansion spectrum is obtained from the expansion coefficient obtained as a result and the low frequency spectrum. Is generated. Note that the processing in step S149 and step S150 is the same as the processing in step S56 and step S57 in FIG.
  • step S151 When the process of step S145, step S148, or step S150 is performed to generate an extended spectrum, the process of step S151 is performed to generate a time-series signal.
  • the process of step S151 is performed in step S58 of FIG. Since this is the same as the above process, the description thereof is omitted.
  • step S151 When the time-series signal obtained in step S151 is output as an audio signal obtained by decoding, the decoding process ends.
  • the decoding device 81 obtains an extension coefficient or an envelope coefficient corresponding to the spectrum characteristic by decoding and inverse quantization, and generates an extension spectrum using the obtained extension coefficient or envelope coefficient.
  • the level of the seed spectrum or line dam noise is adjusted using the expansion coefficient or envelope coefficient corresponding to the spectral characteristics to obtain an extended spectrum, whereby the high frequency level can be adjusted in the frequency domain.
  • the series of processes described above can be executed by hardware or can be executed by software.
  • a program constituting the software is installed in the computer.
  • the computer includes, for example, a general-purpose personal computer capable of executing various functions by installing a computer incorporated in dedicated hardware and various programs.
  • FIG. 18 is a block diagram illustrating a configuration example of hardware of a computer that executes the above-described series of processes by a program.
  • a CPU Central Processing Unit
  • ROM 502 ROM 503
  • RAM 503 RAM
  • An input / output interface 505 is further connected to the bus 504.
  • An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
  • the input unit 506 includes a keyboard, a mouse, a microphone, an image sensor, and the like.
  • the output unit 507 includes a display, a speaker, and the like.
  • the recording unit 508 includes a hard disk, a nonvolatile memory, and the like.
  • the communication unit 509 includes a network interface or the like.
  • the drive 510 drives a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
  • the CPU 501 loads the program recorded in the recording unit 508 to the RAM 503 via the input / output interface 505 and the bus 504 and executes the program, for example. Is performed.
  • the program executed by the computer (CPU 501) can be provided by being recorded on the removable medium 511 as a package medium, for example.
  • the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
  • the program can be installed in the recording unit 508 via the input / output interface 505 by attaching the removable medium 511 to the drive 510. Further, the program can be received by the communication unit 509 via a wired or wireless transmission medium and installed in the recording unit 508. In addition, the program can be installed in the ROM 502 or the recording unit 508 in advance.
  • the program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.
  • the present technology can take a cloud computing configuration in which one function is shared by a plurality of devices via a network and is jointly processed.
  • each step described in the above flowchart can be executed by one device or can be shared by a plurality of devices.
  • the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.
  • the present technology can be configured as follows.
  • a feature amount extraction unit that extracts a feature amount from a spectrum obtained by orthogonally transforming a time series signal; Depending on the feature amount, a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low band of the spectrum, or an extension coefficient for each of a plurality of bands constituting the extension band, A calculation unit for calculating based on the spectrum;
  • An encoding apparatus comprising: a multiplexing unit that multiplexes a low frequency spectrum that is a low frequency component of the spectrum and the extension coefficient to generate a code string.
  • the calculation unit calculates the single extension coefficient when the spectrum tonality is high, and calculates the extension coefficient for each of the plurality of bands when the spectrum tonality is low.
  • Encoding device [13] The encoding unit according to any one of [10] to [12], wherein the calculation unit calculates a ratio between an average amplitude of the extension band of the spectrum and an average amplitude of the low band spectrum as the extension coefficient. apparatus. [14] The code according to [11], wherein the calculation means calculates envelope information of the extension band of the spectrum as the extension coefficient when the low-frequency tonality of the spectrum is high and the tonality of the extension band of the spectrum is low.

Abstract

This patent relates to a coding device and method, a decoding device and method, and a program which enable high-quality sound to be obtained even in an environment with small resources. A decomposition unit decomposes a supplied code string to obtain a quantized low-frequency spectrum, a spectral characteristic code, and a quantized extension coefficient. On this occasion, a single quantized extension coefficient or a quantized extension coefficient for each high-frequency band is included in the code string according to the spectral characteristic code. A spectrum inverse quantization unit inversely quantizes the quantized low-frequency spectrum to obtain a low-frequency spectrum, and an extension coefficient inverse quantization unit inversely quantizes the quantized extension coefficient to obtain an extension coefficient. An extended spectrum generation unit generates an extended spectrum on the basis of the low-frequency spectrum and the extension coefficient corresponding to the spectral characteristic code. An IMDCT unit generates a band-extended time-series signal from the low-frequency spectrum and the extended spectrum. This patent is applicable to a decoding device.

Description

符号化装置および方法、復号装置および方法、並びにプログラムEncoding apparatus and method, decoding apparatus and method, and program
 本技術は符号化装置および方法、復号装置および方法、並びにプログラムに関し、特に、低リソースな環境においても高音質な音声を得ることができるようにした符号化装置および方法、復号装置および方法、並びにプログラムに関する。 The present technology relates to an encoding apparatus and method, a decoding apparatus and method, and a program, and in particular, an encoding apparatus and method, a decoding apparatus and method, and a decoding apparatus that can obtain high-quality sound even in a low resource environment, and Regarding the program.
 従来、音声信号に対する帯域拡張の概念を取り入れた符号化技術が知られている(例えば、特許文献1および特許文献2参照)。 Conventionally, an encoding technique that incorporates the concept of bandwidth expansion for audio signals is known (see, for example, Patent Document 1 and Patent Document 2).
 そのような符号化技術では、音声信号として入力された時系列信号が低域成分と高域成分とに帯域分割され、低域の信号については通常の符号化が行われ、低域の信号と高域の信号の関係性や、高域の信号の特徴等が付加情報として伝送される。 In such an encoding technique, a time-series signal input as an audio signal is band-divided into a low-frequency component and a high-frequency component, and normal encoding is performed on the low-frequency signal, and the low-frequency signal and The relationship of the high frequency signal, the characteristics of the high frequency signal, and the like are transmitted as additional information.
 また、復号時には、低域の信号が復元された後に、その低域の信号、および付加情報が用いられて拡張帯域の信号が生成され、低域の信号と拡張帯域の信号が合成されて、帯域拡張が実現される。 At the time of decoding, after the low-frequency signal is restored, the low-frequency signal and the additional information are used to generate an extended-band signal, and the low-frequency signal and the extended-band signal are combined, Bandwidth expansion is realized.
 より具体的には、低域の信号が復元された後に、その低域の信号が帯域分割フィルタにより複数の各帯域に分割され、それらの分割された低域の信号と付加情報とが用いられて拡張帯域の信号が生成される。そして、低域の信号と拡張帯域の信号とが帯域合成フィルタにより合成されて、帯域拡張された時系列信号が得られる。 More specifically, after the low-frequency signal is restored, the low-frequency signal is divided into a plurality of bands by a band division filter, and the divided low-frequency signal and additional information are used. Thus, an extended band signal is generated. Then, the low-band signal and the extension band signal are synthesized by the band synthesis filter, and a band-expanded time-series signal is obtained.
 ところが、このように帯域分割フィルタや帯域合成フィルタを用いると、これらの帯域分割や帯域合成のフィルタ処理によって、信号の符号化から復号までの原理遅延を増加させてしまうことになる。そうすると、音声信号の入力から出力までの応答速度が低下してしまう。 However, when the band division filter and the band synthesis filter are used in this way, the principle delay from the encoding to the decoding of the signal is increased by the band division and band synthesis filter processing. If it does so, the response speed from the input of an audio | voice signal to an output will fall.
 また、通常の復号処理に加えて、フィルタバンクなどによる帯域分割や帯域合成といったフィルタ処理が必要になるため、処理量やメモリ使用量が大幅に増加し、組み込み機器などの低リソースな環境では復号装置の搭載が困難であった。 In addition to normal decoding processing, filter processing such as band division and band synthesis using a filter bank is required, which greatly increases the amount of processing and memory usage, and decoding is possible in low-resource environments such as embedded devices. It was difficult to install the device.
 そこで、このような符号化技術を改善するものとして、周波数領域で帯域拡張を行うことができるようにする技術が提案されている(例えば、特許文献3参照)。 Therefore, as a technique for improving such an encoding technique, a technique that enables band extension in the frequency domain has been proposed (see, for example, Patent Document 3).
 この技術では、符号化時にMDCT(Modified Discrete Cosine Transform)によって得られたスペクトルが低域側(ベースバンド)と高域側(拡張帯域)に分割され、ベースバンドの信号については通常の符号化が行われ、ベースバンドと拡張帯域のスペクトルの関係性や、拡張帯域のスペクトルの特徴等が付加情報として伝送される。 With this technology, the spectrum obtained by MDCT (Modified Discrete Cosine Transform) at the time of encoding is divided into a low frequency side (baseband) and a high frequency side (extended band), and normal encoding is performed for baseband signals. As a result, the relationship between the spectrum of the baseband and the extension band, the characteristics of the extension band spectrum, and the like are transmitted as additional information.
 また、復号時にはベースバンドのスペクトルと付加情報とが用いられて拡張帯域のスペクトルが生成され、ベースバンドのスペクトルと拡張帯域のスペクトルが合成されて全帯域のスペクトルが生成される。さらに、得られた全帯域のスペクトルに対してIMDCT(Inverse Modified Discrete Cosine Transform)が行われ、これにより全帯域のスペクトルが時系列信号(時間信号)に変換される。 Also, at the time of decoding, the baseband spectrum and the additional information are used to generate an extension band spectrum, and the baseband spectrum and the extension band spectrum are synthesized to generate a full band spectrum. Further, an IMDCT (Inverse Modulated Discrete Cosine Transform) is performed on the obtained spectrum of the entire band, whereby the spectrum of the entire band is converted into a time series signal (time signal).
特許第5329714号公報Japanese Patent No. 5329714 特許第5325293号公報Japanese Patent No. 5325293 特開2011-215198号公報JP 2011-215198 A
 しかしながら、MDCTで得られたスペクトル(以下、MDCTスペクトルとも称する)の各周波数ビンの値は、振幅成分と位相成分の両方の成分が織り込まれた値となっている。そのため、周波数領域で帯域拡張を行う技術では、復号時にMDCTスペクトルを用いて拡張帯域のスペクトルの振幅を細かく調整すると、各スペクトルの位相成分、および各スペクトル間の相互の位相関係が大きく崩れてしまう。 However, the value of each frequency bin of the spectrum obtained by MDCT (hereinafter also referred to as MDCT spectrum) is a value in which both the amplitude component and the phase component are woven. Therefore, in the technology that performs band extension in the frequency domain, if the amplitude of the spectrum in the extension band is finely adjusted using the MDCT spectrum at the time of decoding, the phase component of each spectrum and the mutual phase relationship between each spectrum are greatly destroyed. .
 このような場合、例えば符号化および復号の対象となる音声信号が、ノイズ性の高い楽音や人の声などの信号であるときには、音声信号に聴感上の大きな音質劣化は生じない。 In such a case, for example, when the audio signal to be encoded and decoded is a signal such as a noisy musical tone or a human voice, the audio signal does not significantly deteriorate in sound quality.
 ところが、音声信号が単一楽器や効果音等の特定周波数にエネルギが集中している音声信号、すなわちトーナリティが高い信号である場合には、本来特定周波数に集中しているはずのエネルギが、復号により周囲の周波数のスペクトルに拡散してしまう。そうすると、復号により最終的に得られた音声信号はノイズ性を有することになり、聴感上の音質が劣化してしまう。 However, if the audio signal is an audio signal in which energy is concentrated at a specific frequency such as a single musical instrument or sound effect, that is, a signal with high tonality, the energy that should have been concentrated at the specific frequency is decoded. Will spread into the spectrum of the surrounding frequency. If it does so, the audio | voice signal finally obtained by decoding will have noise property, and sound quality on hearing will deteriorate.
 以上のように周波数領域で帯域拡張を行う技術では、時系列信号に対する帯域分割や帯域合成が不要であるため、遅延を生じさせることなく、低リソースな環境でも音声の符号化および復号を行うことができるが、高音質な音声を得ることができない場合があった。 As described above, the technology for performing band expansion in the frequency domain does not require band division or band synthesis for a time-series signal, so that voice encoding and decoding can be performed even in a low resource environment without causing a delay. However, there were cases where high-quality sound could not be obtained.
 本技術は、このような状況に鑑みてなされたものであり、低リソースな環境においても高音質な音声を得ることができるようにするものである。 The present technology has been made in view of such a situation, and is capable of obtaining high-quality sound even in a low-resource environment.
 本技術の第1の側面の復号装置は、低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得する取得部と、前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成する生成部と、前記低域スペクトルと前記拡張スペクトルを合成する合成部とを備える。 The decoding device according to the first aspect of the present technology includes a low-frequency spectrum and a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low-frequency band, or a plurality of components constituting the extension band An acquisition unit that acquires an extension coefficient for each band; a generation unit that generates the extension spectrum based on the single extension coefficient or the extension coefficient for each of the plurality of bands; and the low-frequency spectrum; And a synthesis unit that synthesizes the extended spectrum.
 前記生成部には、前記低域スペクトルおよび前記拡張係数に基づいて前記拡張スペクトルを生成させることができる。 The generation unit can generate the extended spectrum based on the low frequency spectrum and the extension coefficient.
 前記生成部には、前記拡張係数に基づいて、前記低域スペクトルから得られたスペクトルのレベルを調整することで前記拡張スペクトルを生成させることができる。 The generation unit can generate the extended spectrum by adjusting the level of the spectrum obtained from the low-frequency spectrum based on the extension coefficient.
 前記生成部には、前記単一の前記拡張係数に基づいて前記拡張スペクトルを生成する場合、前記拡張係数に基づいて前記スペクトルの前記拡張帯域全体のレベルを調整させ、前記複数の帯域ごとの前記拡張係数に基づいて前記拡張スペクトルを生成する場合、前記帯域の前記拡張係数に基づいて、前記スペクトルの前記帯域のレベルを調整させることができる。 When generating the extended spectrum based on the single extension coefficient, the generation unit adjusts the level of the entire extension band of the spectrum based on the extension coefficient, and When generating the extended spectrum based on an extension coefficient, the level of the band of the spectrum can be adjusted based on the extension coefficient of the band.
 前記生成部には、前記拡張係数に基づいて、所定のノイズのレベルを調整することで前記拡張スペクトルを生成させることができる。 The generation unit can generate the extended spectrum by adjusting a predetermined noise level based on the extension coefficient.
 前記低域スペクトルの値が、元の時系列信号の振幅成分および位相成分により定まるようにすることができる。 The value of the low frequency spectrum can be determined by the amplitude component and the phase component of the original time series signal.
 前記低域スペクトルを、MDCTスペクトルとすることができる。 The low frequency spectrum can be an MDCT spectrum.
 本技術の第1の側面の復号方法またはプログラムは、低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得し、前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成し、前記低域スペクトルと前記拡張スペクトルを合成するステップを含む。 The decoding method or program according to the first aspect of the present technology configures a low-band spectrum and a single extension coefficient for the extension band to obtain an extension spectrum of an extension band different from the low band, or the extension band Obtaining an extension coefficient for each of a plurality of bands, generating the extension spectrum based on the single extension coefficient or the extension coefficient for each of the plurality of bands, and adding the low-frequency spectrum and the extension spectrum to each other. Synthesizing.
 本技術の第1の側面においては、低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とが取得され、前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルが生成され、前記低域スペクトルと前記拡張スペクトルが合成される。 In the first aspect of the present technology, a low-frequency spectrum, a single expansion coefficient for the expansion band for obtaining an expansion spectrum of an expansion band different from the low-frequency band, or each of a plurality of bands constituting the expansion band And the extension spectrum is generated based on the single extension coefficient or the extension coefficient for each of the plurality of bands, and the low-frequency spectrum and the extension spectrum are combined.
 本技術の第2の側面の符号化装置は、時系列信号を直交変換して得られたスペクトルから特徴量を抽出する特徴量抽出部と、前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出する算出部と、前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成する多重化部とを備える。 The encoding device according to the second aspect of the present technology includes a feature amount extraction unit that extracts a feature amount from a spectrum obtained by orthogonally transforming a time-series signal, and a low-frequency region of the spectrum according to the feature amount. A calculation unit for calculating, based on the spectrum, a single extension coefficient for the extension band for obtaining an extension spectrum of a different extension band, or an extension coefficient for each of a plurality of bands constituting the extension band; And a multiplexing unit that multiplexes a low-frequency spectrum that is a low-frequency component and the extension coefficient to generate a code string.
 前記特徴量を前記スペクトルのトーナリティを示す情報とすることができる。 The feature quantity can be information indicating the tonality of the spectrum.
 前記算出部には、前記スペクトルのトーナリティが高い場合、前記単一の前記拡張係数を算出させ、前記スペクトルのトーナリティが低い場合、前記複数の帯域ごとの前記拡張係数を算出させることができる。 The calculation unit can calculate the single extension coefficient when the tonality of the spectrum is high, and can calculate the extension coefficient for each of the plurality of bands when the tonality of the spectrum is low.
 前記算出部には、前記スペクトルの前記拡張帯域の平均振幅と、前記低域スペクトルの平均振幅との比を前記拡張係数として算出させることができる。 The calculation unit can calculate the ratio between the average amplitude of the extension band of the spectrum and the average amplitude of the low band spectrum as the extension coefficient.
 前記算出手段には、前記スペクトルの低域のトーナリティが高く、前記スペクトルの前記拡張帯域のトーナリティが低い場合、前記スペクトルの前記拡張帯域の包絡情報を前記拡張係数として算出させることができる。 The calculation means can calculate envelope information of the extension band of the spectrum as the extension coefficient when the low-frequency tonality of the spectrum is high and the tonality of the extension band of the spectrum is low.
 前記スペクトルの値が、前記時系列信号の振幅成分および位相成分により定まるようにすることができる。 The value of the spectrum can be determined by the amplitude component and phase component of the time series signal.
 前記直交変換をMDCTとすることができる。 The orthogonal transform can be MDCT.
 本技術の第2の側面の符号化方法またはプログラムは、時系列信号を直交変換して得られたスペクトルから特徴量を抽出し、前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出し、前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成するステップを含む。 The encoding method or program according to the second aspect of the present technology extracts a feature amount from a spectrum obtained by orthogonally transforming a time-series signal, and expands differently from the low band of the spectrum according to the feature amount. A single extension coefficient for the extension band for obtaining an extension spectrum of a band, or an extension coefficient for each of a plurality of bands constituting the extension band is calculated based on the spectrum, and a low band component of the spectrum is calculated. The method includes a step of multiplexing a band spectrum and the extension coefficient to generate a code string.
 本技術の第2の側面においては、時系列信号を直交変換して得られたスペクトルから特徴量が抽出され、前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数が前記スペクトルに基づいて算出され、前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とが多重化されて符号列が生成される。 In the second aspect of the present technology, a feature amount is extracted from a spectrum obtained by orthogonally transforming a time-series signal, and an extension spectrum having an extension band different from the low band of the spectrum is determined according to the feature amount. A single extension coefficient for the extension band to be obtained or an extension coefficient for each of a plurality of bands constituting the extension band is calculated based on the spectrum, and a low-frequency spectrum that is a low-frequency component of the spectrum; The extension coefficient is multiplexed and a code string is generated.
 本技術の第1の側面および第2の側面によれば、低リソースな環境においても高音質な音声を得ることができる。 According to the first and second aspects of the present technology, high-quality sound can be obtained even in a low resource environment.
 なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載された何れかの効果であってもよい。 Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.
符号化装置の構成例を示す図である。It is a figure which shows the structural example of an encoding apparatus. スペクトルの領域と境界について説明する図である。It is a figure explaining the area | region and boundary of a spectrum. 低域折り返し疑似振幅スペクトルについて説明する図である。It is a figure explaining a low region return pseudo amplitude spectrum. 高域スペクトルの分割について説明する図である。It is a figure explaining division | segmentation of a high region spectrum. 符号化処理を説明するフローチャートである。It is a flowchart explaining an encoding process. 復号装置の構成例を示す図である。It is a figure which shows the structural example of a decoding apparatus. 復号処理を説明するフローチャートである。It is a flowchart explaining a decoding process. トーナリティの高い信号について説明する図である。It is a figure explaining a signal with high tonality. 高域疑似振幅スペクトルの平均値について説明する図である。It is a figure explaining the average value of a high region pseudo amplitude spectrum. 拡張スペクトルのレベル調整について説明する図である。It is a figure explaining level adjustment of an extended spectrum. レベル調整によるトーナリティの崩れについて説明する図である。It is a figure explaining collapse of tonality by level adjustment. 拡張スペクトルのレベル調整について説明する図である。It is a figure explaining level adjustment of an extended spectrum. 低域のトーナリティが高く、高域のトーナリティが低い信号の例を示す図である。It is a figure which shows the example of a signal with a high low region tonality and a low high region tonality. 拡張スペクトルの生成と音質劣化について説明する図である。It is a figure explaining the production | generation of an extended spectrum, and sound quality degradation. 包絡係数と拡張スペクトルの生成について説明する図である。It is a figure explaining the production | generation of an envelope coefficient and an extended spectrum. 符号化処理を説明するフローチャートである。It is a flowchart explaining an encoding process. 復号処理を説明するフローチャートである。It is a flowchart explaining a decoding process. コンピュータの構成例を示す図である。It is a figure which shows the structural example of a computer.
 以下、図面を参照して、本技術を適用した実施の形態について説明する。 Hereinafter, embodiments to which the present technology is applied will be described with reference to the drawings.
〈第1の実施の形態〉
〈符号化装置の構成例〉
 図1は、本技術を適用した符号化装置の一実施の形態の構成例を示す図である。
<First Embodiment>
<Configuration example of encoding device>
FIG. 1 is a diagram illustrating a configuration example of an embodiment of an encoding device to which the present technology is applied.
 図1に示す符号化装置11はMDCT部21、スペクトル量子化部22、低域特徴量抽出部23、高域特徴量抽出部24、スペクトル特性決定部25、拡張係数算出部26、拡張係数量子化部27、および多重化部28を有している。 The encoding apparatus 11 shown in FIG. 1 includes an MDCT unit 21, a spectrum quantization unit 22, a low frequency feature quantity extraction unit 23, a high frequency feature quantity extraction unit 24, a spectrum characteristic determination unit 25, an extension coefficient calculation unit 26, an extension coefficient quantum. And a multiplexing unit 28.
 MDCT部21には、符号化対象の音声信号として、例えばサンプリング周波数Fs[kHz]の時系列信号である入力信号が供給される。 The MDCT unit 21 is supplied with an input signal that is a time-series signal having a sampling frequency Fs [kHz], for example, as a speech signal to be encoded.
 MDCT部21は、供給された入力信号に対して、直交変換として例えばMDCTを行い、直流成分である周波数Dc[kHz]から、サンプリング周波数Fsの半分の周波数Fs/2までのスペクトルを得る。 The MDCT unit 21 performs, for example, MDCT as orthogonal transform on the supplied input signal, and obtains a spectrum from the frequency Dc [kHz], which is a DC component, to a frequency Fs / 2 that is half the sampling frequency Fs.
 なお、以下では直交変換としてMDCTが行われる場合を例として説明を続けるが、直交変換により得られたスペクトルの値が、振幅成分と位相成分の両方の成分が織り込まれた値となるものであれば、MDCTに限らずどのような変換が行われてもよい。 In the following, description will be continued by taking MDCT as an example of orthogonal transform, but the spectrum value obtained by orthogonal transform may be a value in which both amplitude component and phase component are woven. For example, any conversion is not limited to MDCT.
 また、ここでは符号化効率を向上させるため、実際に符号化されるのは、直交変換で得られたスペクトルのうちの周波数Dcから、聴感上敏感な周波数Fc[kHz]までの成分とされ、残りのスペクトルは棄損されるものとする。つまり、スペクトルのうちの周波数Fcから周波数Fs/2までの部分は棄損されるものとする。 In addition, here, in order to improve the encoding efficiency, what is actually encoded is a component from the frequency Dc of the spectrum obtained by the orthogonal transform to the frequency Fc [kHz] that is sensitive to hearing, The remaining spectrum shall be discarded. That is, it is assumed that the part of the spectrum from the frequency Fc to the frequency Fs / 2 is lost.
 さらに符号化効率を向上させるために、復号側において帯域拡張が行われるとする。 Suppose that the band expansion is performed on the decoding side in order to further improve the encoding efficiency.
 例えば図2に示すように、MDCT部21での直交変換で得られたスペクトルが低域スペクトル、高域スペクトル、および棄損スペクトルに分割されるものとする。なお、図2において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 For example, as shown in FIG. 2, it is assumed that the spectrum obtained by orthogonal transformation in the MDCT unit 21 is divided into a low-frequency spectrum, a high-frequency spectrum, and a loss spectrum. In FIG. 2, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 この例ではスペクトル全体における、直流成分である周波数Dcから上限周波数Fb[kHz]までの成分が低域スペクトルとされており、入力信号の符号化時には、低域スペクトルに対して通常の符号化が行われる。 In this example, the component from the frequency Dc, which is the direct current component, to the upper limit frequency Fb [kHz] in the entire spectrum is the low frequency spectrum, and when the input signal is encoded, normal encoding is performed on the low frequency spectrum. Done.
 また、スペクトル全体における、上限周波数Fbから周波数Fcまでの成分が高域スペクトルとされている。入力信号の符号化時には、この高域スペクトルの符号化は行われないが、復号時には低域スペクトルと、後述する付加情報である拡張係数とが用いられて疑似的な高域スペクトル(以下、拡張スペクトルとも称する)が生成されて、帯域拡張が実現される。すなわち、復号時には、上限周波数Fbから周波数Fcまでの周波数帯域が、帯域拡張の対象である拡張帯域とされる。 Also, the component from the upper limit frequency Fb to the frequency Fc in the whole spectrum is a high frequency spectrum. When the input signal is encoded, the high-frequency spectrum is not encoded, but at the time of decoding, the low-frequency spectrum and an extension coefficient that is additional information described later are used to generate a pseudo high-frequency spectrum (hereinafter referred to as an extended spectrum). (Also referred to as a spectrum) is generated, and band extension is realized. That is, at the time of decoding, the frequency band from the upper limit frequency Fb to the frequency Fc is set as an extension band that is a target of band extension.
 さらに、スペクトル全体における周波数Fcから周波数Fs/2までの部分は棄損スペクトルとされて、棄損される。 Furthermore, the portion from the frequency Fc to the frequency Fs / 2 in the whole spectrum is regarded as a loss spectrum and is lost.
 なお、以下では、周波数Dcから上限周波数Fbまで帯域を低域と称し、上限周波数Fbから周波数Fcまでの帯域を高域と称することとする。また、以下では周波数Fcから周波数Fs/2までの帯域を棄損帯域と称することとする。 In the following, the band from the frequency Dc to the upper limit frequency Fb is referred to as a low band, and the band from the upper limit frequency Fb to the frequency Fc is referred to as a high band. Hereinafter, a band from the frequency Fc to the frequency Fs / 2 is referred to as a loss band.
 したがって、この例では、低域成分のみ入力信号の符号化が行われ、高域成分は復号時に帯域拡張により生成されることになる。 Therefore, in this example, only the low frequency component is encoded with the input signal, and the high frequency component is generated by band expansion at the time of decoding.
 図1の説明に戻り、MDCT部21は、入力信号に対してMDCTを行い、その結果得られた全帯域のスペクトルのうちの低域スペクトルをスペクトル量子化部22および低域特徴量抽出部23に供給するとともに、高域スペクトルを高域特徴量抽出部24に供給する。 Returning to the description of FIG. 1, the MDCT unit 21 performs MDCT on the input signal, and converts the low band spectrum of the spectrum of the entire band obtained as a result of the spectrum quantization unit 22 and the low band feature amount extraction unit 23. And the high frequency spectrum is supplied to the high frequency feature amount extraction unit 24.
 スペクトル量子化部22は、MDCT部21から供給された低域スペクトルを量子化し、その結果得られた量子化低域スペクトルを多重化部28に供給する。 The spectrum quantization unit 22 quantizes the low frequency spectrum supplied from the MDCT unit 21 and supplies the quantized low frequency spectrum obtained as a result to the multiplexing unit 28.
 低域特徴量抽出部23は、MDCT部21から供給された低域スペクトルから特徴量(以下、低域スペクトル特徴量とも称する)を抽出し、スペクトル特性決定部25に供給するとともに、低域スペクトルの振幅情報を拡張係数算出部26に供給する。 The low-frequency feature quantity extraction unit 23 extracts a feature quantity (hereinafter also referred to as a low-frequency spectrum feature quantity) from the low-frequency spectrum supplied from the MDCT unit 21 and supplies the extracted feature quantity to the spectral characteristic determination unit 25 and the low-frequency spectrum. Is supplied to the expansion coefficient calculation unit 26.
 高域特徴量抽出部24は、MDCT部21から供給された高域スペクトルから特徴量(以下、高域スペクトル特徴量とも称する)を抽出し、スペクトル特性決定部25に供給するとともに、高域スペクトルの振幅情報を拡張係数算出部26に供給する。 The high frequency feature quantity extraction unit 24 extracts a feature quantity (hereinafter also referred to as a high frequency spectrum feature quantity) from the high frequency spectrum supplied from the MDCT unit 21, supplies the feature quantity to the spectral characteristic determination unit 25, and Is supplied to the expansion coefficient calculation unit 26.
 ここで、低域スペクトル特徴量および高域スペクトル特徴量について説明する。 Here, the low-frequency spectrum feature value and the high-frequency spectrum feature value will be described.
 MDCT部21で得られたスペクトルから、低域スペクトル特徴量や高域スペクトル特徴量といった特徴量を抽出するためには、スペクトルの振幅の特性を観察することが必要となる。しかし、MDCT部21で得られるスペクトルは、例えばMDCTにより得られるMDCTスペクトルであり、MDCTスペクトルは、DFT(Discrete Fourier Transform)により得られるDFTスペクトルとは異なる性質を有している。なお、MDCTスペクトルはMDCT係数とも呼ばれている。 In order to extract feature quantities such as a low-frequency spectrum feature quantity and a high-frequency spectrum feature quantity from the spectrum obtained by the MDCT section 21, it is necessary to observe the spectrum amplitude characteristics. However, the spectrum obtained by the MDCT unit 21 is, for example, an MDCT spectrum obtained by MDCT, and the MDCT spectrum has a property different from a DFT spectrum obtained by DFT (DiscretecreFourier Transform). The MDCT spectrum is also called an MDCT coefficient.
 具体的には、DFTスペクトルには、振幅成分と位相成分とがそれぞれ独立に含まれている。これに対してMDCTスペクトルの値、つまりMDCTスペクトルの各周波数ビンにおける値は、振幅成分と位相成分の両成分が織り込まれた値となっている。すなわち、MDCTスペクトルの値は、入力信号の振幅成分と位相成分によって定まり、MDCTスペクトルの値からは、振幅成分と位相成分の何れか一方のみの値を知ることはできない。 Specifically, the DFT spectrum includes an amplitude component and a phase component independently of each other. On the other hand, the value of the MDCT spectrum, that is, the value in each frequency bin of the MDCT spectrum is a value in which both the amplitude component and the phase component are woven. That is, the value of the MDCT spectrum is determined by the amplitude component and the phase component of the input signal, and the value of only one of the amplitude component and the phase component cannot be known from the value of the MDCT spectrum.
 そのため、DFTスペクトルを使用する場合には、振幅スペクトルまたはパワースペクトルを使用して信号の振幅を観察することが可能であるが、MDCTスペクトルの場合には、そのままの形ではMDCTスペクトルから信号の振幅を観察することは困難である。 Therefore, when using the DFT spectrum, it is possible to observe the amplitude of the signal using the amplitude spectrum or the power spectrum, but in the case of the MDCT spectrum, the amplitude of the signal is directly derived from the MDCT spectrum. It is difficult to observe.
 したがって、MDCTスペクトルに対してMDCTの逆変換であるIMDCTを行い、入力信号を一度、時系列信号に戻してから、その時系列信号に対して特徴量抽出のためにDFTを行うことが考えられる。 Therefore, it is conceivable that IMDCT, which is the inverse transform of MDCT, is performed on the MDCT spectrum, the input signal is once converted back to a time-series signal, and then DFT is performed on the time-series signal for feature quantity extraction.
 しかし、そのような場合には符号化装置11にIMDCTやDFTの処理ブロックをさらに追加する必要があり、計算量と、ROM(Read Only Memory)やRAM(Random Access Memory)などのメモリ使用量との大幅な増加が見込まれることになる。そうすると、ポータブル機器などの演算リソースが限られた低リソース環境で、入力信号の符号化を行うことが困難となる。 However, in such a case, it is necessary to further add an IMDCT or DFT processing block to the encoding device 11, and the amount of calculation and the amount of memory used such as ROM (Read Only Memory) or RAM (Random Access Memory) A significant increase is expected. If it does so, it will become difficult to encode an input signal in the low resource environment where calculation resources, such as portable equipment, were restricted.
 そこで、本技術を適用した符号化装置11は、MDCTスペクトルに基づいて次式(1)により疑似振幅スペクトルSkを算出し、特徴量抽出に使用する。 Therefore, the encoding apparatus 11 according to the present technology, based on the MDCT spectrum to calculate the pseudo amplitude spectrum S k by the following equation (1), used for feature extraction.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 なお、式(1)において、疑似振幅スペクトルSkは、MDCTスペクトルのk番目の周波数ビンに対応する疑似振幅スペクトルを示しており、ykはk番目の周波数ビンに対応するMDCTスペクトルの値を示している。したがって、式(1)では、連続する3つの周波数ビンに対応するMDCTスペクトルの値に基づいて、1つの周波数ビンについて疑似振幅スペクトルSkが算出される。 In the equation (1), the pseudo amplitude spectrum S k shows the pseudo amplitude spectrum corresponding to the k-th frequency bin of the MDCT spectrum, y k is a value of the MDCT spectrum corresponding to the k-th frequency bin Show. Therefore, in Equation (1), the pseudo amplitude spectrum Sk is calculated for one frequency bin based on the value of the MDCT spectrum corresponding to three consecutive frequency bins.
 このようにして得られた疑似振幅スペクトルSkの値は、振幅スペクトルに類似した値となる。つまり、疑似振幅スペクトルSkの値はDFTスペクトルの振幅スペクトルと強い相関を有する値となるため、疑似振幅スペクトルSkの値は、MDCTスペクトルの各周波数における疑似的な振幅値を示しているということができる。 The value of the pseudo amplitude spectrum S k obtained in this way is a similar value to the amplitude spectrum. That is, the value of the pseudo amplitude spectrum S k is a value having a strong correlation with the amplitude spectrum of the DFT spectrum, and therefore the value of the pseudo amplitude spectrum S k indicates a pseudo amplitude value at each frequency of the MDCT spectrum. be able to.
 なお、以下では、低域スペクトルについて求めた疑似振幅スペクトルを、特に低域疑似振幅スペクトルとも称し、高域スペクトルについて求めた疑似振幅スペクトルを、特に高域疑似振幅スペクトルとも称することとする。 In the following, the pseudo-amplitude spectrum obtained for the low-frequency spectrum is also referred to as a low-frequency pseudo-amplitude spectrum, and the pseudo-amplitude spectrum obtained for the high-frequency spectrum is particularly referred to as a high-frequency pseudo-amplitude spectrum.
 低域特徴量抽出部23および高域特徴量抽出部24は、低域スペクトルおよび高域スペクトルの各周波数(周波数ビン)について、式(1)により疑似振幅スペクトルSkを算出し、得られた各周波数ビンの疑似振幅スペクトルSkから特徴量を算出する。 Low range characteristic amount extraction section 23 and the high-range feature extraction unit 24, the low frequency band spectrum and high spectrum of each frequency (frequency bins), and calculates the pseudo amplitude spectrum S k by equation (1), the resulting It calculates a characteristic quantity from the pseudo amplitude spectrum S k for each frequency bin.
 例えば、低域特徴量抽出部23および高域特徴量抽出部24は、低域スペクトル特徴量および高域スペクトル特徴量として、次式(2)の計算により、スペクトルのノイズ性の高さを示す指標となるSpectral Flatness(以下、SFとも称する)を算出する。 For example, the low-frequency feature quantity extraction unit 23 and the high-frequency feature quantity extraction unit 24 indicate the high noise characteristics of the spectrum by the calculation of the following equation (2) as the low-frequency spectrum feature value and the high-frequency spectrum feature value. Spectral Flatness (hereinafter also referred to as SF) serving as an index is calculated.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 なお、式(2)においてNは対象となるスペクトルの本数、つまり周波数ビンの数を示している。また、Siはi番目の周波数ビンの疑似振幅スペクトルの値を示している。 In Equation (2), N indicates the number of target spectra, that is, the number of frequency bins. Further, S i represents the value of the pseudo amplitude spectrum of the i-th frequency bin.
 したがって、例えば高域スペクトルについてSFを求める場合、高域スペクトルの全周波数ビンについて求めた疑似振幅スペクトルSkの幾何平均に対する、高域スペクトルの全周波数ビンについて求めた疑似振幅スペクトルSkの算術平均の比がSFとなる。 Therefore, for example, when calculating SF for the high frequency spectrum, the arithmetic average of the pseudo amplitude spectrum S k calculated for all frequency bins of the high frequency spectrum, relative to the geometric average of the pseudo amplitude spectrum S k calculated for all frequency bins of the high frequency spectrum The ratio is SF.
 このようにして算出されるSFは、スペクトルの平坦さの度合いを示しており、0.0乃至1.0の範囲の値をとる。 The SF calculated in this way indicates the degree of flatness of the spectrum and takes a value in the range of 0.0 to 1.0.
 例えばSFの値が大きいほど、つまりSFの値が1.0に近いほどスペクトルの起伏が小さく平坦であり、スペクトルのノイズ性が高いことを示している。逆にSFの値が小さいほど、つまりSFの値が0.0に近いほどスペクトルのトーナリティが高い(ノイズ性が低い)ことを示している。 For example, the larger the SF value, that is, the closer the SF value is to 1.0, the smaller and flatter the spectrum is, and the higher the noise characteristics of the spectrum. Conversely, the smaller the SF value, that is, the closer the SF value is to 0.0, the higher the spectral tonality (lower noise).
 なお、特徴量としてSFが算出される例について説明したが、特徴量としてどのようなものが算出されるようにしてもよい。 Although an example in which SF is calculated as a feature amount has been described, any feature amount may be calculated.
 例えばSF以外にもスペクトルのノイズ性の高さを示す指標、換言すればトーナリティの高さを示す指標はあるので、符号化装置11で要求される特徴量の精度や許容される計算量に応じて、ノイズ性の高さを示す他の指標を特徴量として算出してもよい。 For example, other than SF, there is an index indicating the high level of noise in the spectrum, in other words, an index indicating the high level of tonality, so that it depends on the accuracy of the feature amount required by the encoding device 11 and the allowable calculation amount. Thus, another index indicating the high noise property may be calculated as the feature amount.
 SFとは異なる特徴量の例として、例えば次式(3)に示すスペクトル集中度Dを低域スペクトル特徴量や高域スペクトル特徴量として算出するようにしてもよい。 As an example of a feature quantity different from SF, for example, a spectrum concentration degree D shown in the following equation (3) may be calculated as a low-frequency spectrum feature quantity or a high-frequency spectrum feature quantity.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 なお、式(3)において、Nは対象となるスペクトルの本数、つまり周波数ビンの数を示している。また、Siはi番目の周波数ビンに対応する疑似振幅スペクトルの値を示しており、Max(Si)は各周波数ビンに対応する疑似振幅スペクトルSiのなかの最大値を示している。 In Equation (3), N indicates the number of target spectra, that is, the number of frequency bins. S i indicates the value of the pseudo amplitude spectrum corresponding to the i th frequency bin, and Max (S i ) indicates the maximum value of the pseudo amplitude spectrum S i corresponding to each frequency bin.
 したがって、式(3)の例では、疑似振幅スペクトルSkの最大値に対する、疑似振幅スペクトルSkの算術平均の比がスペクトル集中度Dとなる。 Therefore, in the example of Expression (3), the ratio of the arithmetic average of the pseudo amplitude spectrum S k to the maximum value of the pseudo amplitude spectrum S k is the spectrum concentration degree D.
 MDCTスペクトルでは、スペクトル集中度Dの値が大きいほどスペクトルの分布に偏りがあってトーナリティが高く、逆にスペクトル集中度Dの値が小さいほどスペクトルの分布が平坦でノイズ性が高い傾向が現れる。 In the MDCT spectrum, the larger the value of the spectrum concentration degree D, the more uneven the distribution of the spectrum and the higher the tonality. On the contrary, the smaller the value of the spectrum concentration degree D, the more flat the distribution of the spectrum and the higher the noise characteristic.
 このように、特徴量としてどのようなものが算出されてもよいが、以下では特徴量としてSFが算出されるものとして説明を続ける。 As described above, any feature amount may be calculated, but the description will be continued below assuming that SF is calculated as the feature amount.
 具体的には、低域特徴量抽出部23が低域スペクトル特徴量を算出する場合、図3に示すように、低域スペクトルについて算出した低域疑似振幅スペクトルを、上限周波数Fbを境界として高域側に折り返して得られる低域折り返し疑似振幅スペクトルについて、上述したSFを算出する。 Specifically, when the low-frequency feature quantity extraction unit 23 calculates the low-frequency spectrum feature quantity, as shown in FIG. 3, the low-frequency pseudo-amplitude spectrum calculated for the low-frequency spectrum is high with the upper limit frequency Fb as a boundary. The above-described SF is calculated for the low-frequency aliasing pseudo amplitude spectrum obtained by aliasing to the band side.
 なお、図3において、縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 In FIG. 3, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 この例では、曲線C11により表される低域疑似振幅スペクトルが上限周波数Fbの位置で高域側に折り返されて、曲線C12により表される低域折り返し疑似振幅スペクトルとされている。したがって、低域疑似振幅スペクトルと低域折り返し疑似振幅スペクトルとは左右対称な波形となっている。 In this example, the low-frequency pseudo-amplitude spectrum represented by the curve C11 is folded back to the high-frequency side at the position of the upper limit frequency Fb to be the low-frequency aliasing pseudo-amplitude spectrum represented by the curve C12. Therefore, the low frequency pseudo amplitude spectrum and the low frequency aliasing pseudo amplitude spectrum are symmetrical waveforms.
 図1の説明に戻り、低域特徴量抽出部23は、折り返しにより得られた低域折り返し疑似振幅スペクトルのうちの上限周波数Fbから周波数Fcまでの帯域の各周波数ビンについて、式(2)の計算により低域スペクトル特徴量としてSFを算出する。なお、以下では、低域スペクトル特徴量として算出されたSFを特にSFLとも称することとする。 Returning to the description of FIG. 1, the low frequency feature quantity extraction unit 23 calculates the frequency bins of the range from the upper limit frequency Fb to the frequency Fc in the low frequency aliasing pseudo-amplitude spectrum obtained by the aliasing by the equation (2). SF is calculated as a low-frequency spectrum feature value by calculation. In the following, SF calculated as a low-frequency spectrum feature amount is particularly referred to as SFL.
 低域特徴量抽出部23は、このようにして得られた低域スペクトル特徴量としてのSFLをスペクトル特性決定部25に供給するとともに、低域折り返し疑似振幅スペクトルを振幅情報として拡張係数算出部26に供給する。このとき、例えば低域折り返し疑似振幅スペクトルにおける上限周波数Fbから周波数Fcまでの部分が拡張係数算出部26に供給される。 The low frequency feature quantity extraction unit 23 supplies the SFL as the low frequency spectrum feature quantity obtained in this way to the spectrum characteristic determination unit 25, and also uses the low frequency aliasing pseudo amplitude spectrum as the amplitude information as the expansion coefficient calculation unit 26. To supply. At this time, for example, a portion from the upper limit frequency Fb to the frequency Fc in the low frequency aliasing pseudo amplitude spectrum is supplied to the expansion coefficient calculation unit 26.
 また、高域特徴量抽出部24は、高域スペクトルから得られた高域疑似振幅スペクトルの各周波数ビンについて、式(2)の計算により高域スペクトル特徴量としてSFを算出する。なお、以下では、高域スペクトル特徴量として算出されたSFを特にSFHとも称することとする。 Further, the high frequency feature quantity extraction unit 24 calculates SF as the high frequency spectrum feature quantity by calculating the equation (2) for each frequency bin of the high frequency pseudo amplitude spectrum obtained from the high frequency spectrum. In the following, SF calculated as a high-frequency spectrum feature is particularly referred to as SFH.
 高域特徴量抽出部24は、このようにして得られた高域スペクトル特徴量としてのSFHをスペクトル特性決定部25に供給するとともに、高域疑似振幅スペクトルを振幅情報として拡張係数算出部26に供給する。 The high frequency feature quantity extraction unit 24 supplies the SFH as the high frequency spectrum feature value obtained in this way to the spectrum characteristic determination unit 25 and also supplies the high frequency pseudo amplitude spectrum to the expansion coefficient calculation unit 26 as amplitude information. Supply.
 スペクトル特性決定部25は、低域特徴量抽出部23から供給された低域スペクトル特徴量と、高域特徴量抽出部24から供給された高域スペクトル特徴量とに基づいて、符号化対象の入力信号のスペクトル特性を示すスペクトル特性符号を生成する。 Based on the low-frequency spectrum feature quantity supplied from the low-frequency feature quantity extraction unit 23 and the high-frequency spectrum feature quantity supplied from the high-frequency feature quantity extraction unit 24, the spectral characteristic determination unit 25 A spectral characteristic code indicating the spectral characteristic of the input signal is generated.
 例えば、低域スペクトル特徴量であるSFL、および高域スペクトル特徴量であるSFHが、ともに所定の閾値未満である場合、スペクトル特性符号は高いトーナリティを示す符号とされる。つまり、入力信号(MDCTスペクトル)はトーナリティが高いというスペクトル特性を有しているとされる。ここでは、高いトーナリティを示すスペクトル特性符号の値は「1」とされるものとする。 For example, when both SFL, which is a low-frequency spectrum feature quantity, and SFH, which is a high-frequency spectrum feature quantity, are less than a predetermined threshold, the spectrum characteristic code is a code indicating high tonality. In other words, the input signal (MDCT spectrum) has a spectral characteristic that the tonality is high. Here, the value of the spectrum characteristic code indicating high tonality is assumed to be “1”.
 また、低域スペクトル特徴量であるSFL、および高域スペクトル特徴量であるSFHのうちの少なくとも何れか一方が閾値以上である場合、スペクトル特性符号は高いトーナリティではないことを示す符号とされる。つまり、入力信号は、トーナリティが高くない、換言すればノイズ性が高いというスペクトル特性を有しているとされる。ここでは、高いトーナリティではないことを示すスペクトル特性符号の値は「0」とされるものとする。 Also, when at least one of SFL, which is a low-frequency spectrum feature quantity, and SFH, which is a high-frequency spectrum feature quantity, is greater than or equal to a threshold value, the spectrum characteristic code is a code indicating that it is not high tonality. That is, the input signal has a spectral characteristic that the tonality is not high, in other words, the noise property is high. Here, it is assumed that the value of the spectrum characteristic code indicating that the tonality is not high is “0”.
 このように、MDCTスペクトルの低域成分および高域成分の両方においてトーナリティが高い場合には、スペクトル特性符号は「1」とされ、MDCTスペクトルの低域成分および高域成分の少なくとも一方のノイズ性が高い場合には、スペクトル特性符号は「0」とされる。 As described above, when the tonality is high in both the low frequency component and the high frequency component of the MDCT spectrum, the spectral characteristic code is “1”, and the noise property of at least one of the low frequency component and the high frequency component of the MDCT spectrum is set. Is high, the spectral characteristic code is set to “0”.
 スペクトル特性決定部25は、このようにして得られたスペクトル特性符号を、拡張係数算出部26、拡張係数量子化部27、および多重化部28に供給する。 The spectrum characteristic determination unit 25 supplies the spectrum characteristic code obtained in this way to the extension coefficient calculation unit 26, the extension coefficient quantization unit 27, and the multiplexing unit 28.
 拡張係数算出部26は低域特徴量抽出部23からの低域折り返し疑似振幅スペクトル、高域特徴量抽出部24からの高域疑似振幅スペクトル、およびスペクトル特性決定部25からのスペクトル特性符号に基づいて拡張係数を算出し、拡張係数量子化部27に供給する。 The expansion coefficient calculation unit 26 is based on the low-frequency aliasing pseudo-amplitude spectrum from the low-frequency feature quantity extraction unit 23, the high-frequency pseudo-amplitude spectrum from the high-frequency feature quantity extraction unit 24, and the spectrum characteristic code from the spectrum characteristic determination unit 25. The expansion coefficient is calculated and supplied to the expansion coefficient quantization unit 27.
 ここで拡張係数は、復号時に周波数領域で高域のレベル調整を行うための情報であり、高域疑似振幅スペクトルと低域折り返し疑似振幅スペクトルのレベルの比を示している。換言すれば、拡張係数は高域スペクトルの平均振幅と低域スペクトルの平均振幅との比を示している。 Here, the expansion coefficient is information for performing level adjustment of the high frequency in the frequency domain at the time of decoding, and indicates the ratio of the levels of the high frequency pseudo amplitude spectrum and the low frequency aliasing pseudo amplitude spectrum. In other words, the expansion coefficient indicates the ratio between the average amplitude of the high frequency spectrum and the average amplitude of the low frequency spectrum.
 具体的には、拡張係数算出部26は、スペクトル特性符号が「1」である場合、高域の範囲、つまり上限周波数Fbから周波数Fcまでの帯域の各周波数ビンの高域疑似振幅スペクトルの値の平均値を算出する。また、拡張係数算出部26は、上限周波数Fbから周波数Fcまでの帯域の各周波数ビンの低域折り返し疑似振幅スペクトルの値の平均値を算出し、高域疑似振幅スペクトルの平均値を、低域折り返し疑似振幅スペクトルの平均値で除算して得られる値を拡張係数とする。この場合、高域全体、つまり拡張帯域全体に対して1つの拡張係数が得られることになる。 Specifically, when the spectrum characteristic code is “1”, the expansion coefficient calculation unit 26 is a high frequency range, that is, the value of the high frequency pseudo amplitude spectrum of each frequency bin in the band from the upper limit frequency Fb to the frequency Fc. The average value of is calculated. Further, the expansion coefficient calculation unit 26 calculates the average value of the low-frequency aliasing pseudo-amplitude spectrum of each frequency bin in the band from the upper limit frequency Fb to the frequency Fc, and calculates the average value of the high-frequency pseudo-amplitude spectrum as the low frequency A value obtained by dividing by the average value of the aliasing pseudo amplitude spectrum is defined as an expansion coefficient. In this case, one expansion coefficient is obtained for the entire high band, that is, the entire expansion band.
 これに対して、拡張係数算出部26はスペクトル特性符号が「0」である場合、例えば図4に示すように、人間の聴覚特性を考慮して、低域側から高域側にいくに従って分割された帯域幅が広くなるように高域を複数帯域に分割し、帯域ごとに拡張係数を算出する。 On the other hand, when the spectrum characteristic code is “0”, the expansion coefficient calculation unit 26 divides from the low frequency side to the high frequency side in consideration of human auditory characteristics, for example, as shown in FIG. The high band is divided into a plurality of bands so that the obtained bandwidth becomes wide, and the expansion coefficient is calculated for each band.
 なお、図4において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 In FIG. 4, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 この例では、高域スペクトルの周波数帯域、つまり高域である上限周波数Fbから周波数Fcまでの周波数帯域が帯域B1乃至帯域B5の5つの帯域に分割されている。そして、分割により得られた各帯域の幅は、周波数Fc側にある帯域ほど広くなっている。 In this example, the frequency band of the high band spectrum, that is, the frequency band from the upper limit frequency Fb to the frequency Fc, which is the high band, is divided into five bands, band B1 to band B5. The width of each band obtained by the division is wider as the band is on the frequency Fc side.
 拡張係数算出部26は、高域を構成するこれらの帯域B1乃至帯域B5ごとに、高域疑似振幅スペクトルの値の平均値を、低域折り返し疑似振幅スペクトルの値の平均値で除算して得られる値を算出し、得られた値を各帯域の拡張係数とする。 The expansion coefficient calculation unit 26 obtains the average value of the high-frequency pseudo-amplitude spectrum value by dividing the average value of the low-frequency aliasing pseudo-amplitude spectrum value for each of the bands B1 to B5 constituting the high frequency band. The obtained value is calculated, and the obtained value is set as the expansion coefficient of each band.
 例えば帯域B1内の各周波数ビンにおける高域疑似振幅スペクトルの値の平均値を、帯域B1内の各周波数ビンにおける低域折り返し疑似振幅スペクトルの値の平均値で除算して得られる値が帯域B1の拡張係数とされる。 For example, the value obtained by dividing the average value of the high frequency pseudo amplitude spectrum in each frequency bin in the band B1 by the average value of the low frequency aliasing pseudo amplitude spectrum in each frequency bin in the band B1 is the band B1. An expansion coefficient of
 したがって、高域を分割して得られたi番目の帯域(領域)の拡張係数Ciは、次式(4)により算出されることになる。 Accordingly, the expansion coefficient C i of the i-th band (region) obtained by dividing the high region is calculated by the following equation (4).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 なお、式(4)において、Skはi番目の帯域内のk番目の周波数ビンの高域疑似振幅スペクトルの値を示しており、Lkはi番目の帯域内のk番目の周波数ビンの低域折り返し疑似振幅スペクトルの値を示している。また、Mはi番目の帯域内のスペクトル数、つまり周波数ビンの数を示している。 In Equation (4), S k represents the value of the high frequency pseudo-amplitude spectrum of the k th frequency bin in the i th band, and L k represents the k th frequency bin in the i th band. The value of the low-frequency aliasing pseudo amplitude spectrum is shown. M represents the number of spectra in the i-th band, that is, the number of frequency bins.
 拡張係数量子化部27は、スペクトル特性決定部25から供給されたスペクトル特性符号に基づいて、拡張係数算出部26から供給された拡張係数を量子化し、その結果得られた量子化拡張係数を多重化部28に供給する。 The expansion coefficient quantization unit 27 quantizes the expansion coefficient supplied from the expansion coefficient calculation unit 26 based on the spectral characteristic code supplied from the spectral characteristic determination unit 25, and multiplexes the quantized expansion coefficient obtained as a result. To the conversion unit 28.
 例えば、スペクトル特性符号が「1」である場合、高域全体に対して算出された単一の拡張係数に対してスカラ量子化が行われる。これに対して、スペクトル特性符号が「0」である場合、高域内の分割された帯域(領域)ごとに算出された複数の拡張係数に対してスカラ量子化またはベクトル量子化が行われる。 For example, when the spectrum characteristic code is “1”, scalar quantization is performed on a single extension coefficient calculated for the entire high frequency band. On the other hand, when the spectral characteristic code is “0”, scalar quantization or vector quantization is performed on a plurality of extension coefficients calculated for each divided band (region) in the high band.
 多重化部28は、スペクトル量子化部22からの量子化低域スペクトル、スペクトル特性決定部25からのスペクトル特性符号、および拡張係数量子化部27からの量子化拡張係数を多重化し、その結果得られた符号列を出力する。このとき、多重化部28は、量子化低域スペクトルをエントロピ符号化するとともに、量子化拡張係数の符号化も行う。 The multiplexing unit 28 multiplexes the quantized low frequency spectrum from the spectrum quantizing unit 22, the spectrum characteristic code from the spectrum characteristic determining unit 25, and the quantized expansion coefficient from the expansion coefficient quantizing unit 27, and obtains the result. The encoded code string is output. At this time, the multiplexing unit 28 performs entropy encoding on the quantized low frequency spectrum and also encodes the quantization extension coefficient.
〈符号化処理の説明〉
 続いて、符号化装置11の動作について説明する。
<Description of encoding process>
Next, the operation of the encoding device 11 will be described.
 符号化装置11は、例えば外部から符号化対象となる入力信号が供給されると、符号化処理を開始し、入力信号の符号化を行う。以下、図5のフローチャートを参照して、符号化装置11による符号化処理について説明する。 For example, when an input signal to be encoded is supplied from the outside, the encoding device 11 starts an encoding process and encodes the input signal. Hereinafter, the encoding process performed by the encoding device 11 will be described with reference to the flowchart of FIG.
 ステップS11において、MDCT部21は供給された入力信号に対してMDCTを行う。そして、MDCT部21はMDCTにより得られたMDCTスペクトルのうちの低域部分を低域スペクトルとしてスペクトル量子化部22および低域特徴量抽出部23に供給するとともに、MDCTスペクトルのうちの高域部分を高域スペクトルとして高域特徴量抽出部24に供給する。 In step S11, the MDCT unit 21 performs MDCT on the supplied input signal. Then, the MDCT unit 21 supplies the low-frequency part of the MDCT spectrum obtained by MDCT as a low-frequency spectrum to the spectrum quantization unit 22 and the low-frequency feature quantity extracting unit 23, and the high-frequency part of the MDCT spectrum. Is supplied to the high frequency feature quantity extraction unit 24 as a high frequency spectrum.
 ステップS12において、スペクトル量子化部22は、MDCT部21から供給された低域スペクトルを量子化し、その結果得られた量子化低域スペクトルを多重化部28に供給する。 In step S12, the spectrum quantization unit 22 quantizes the low frequency spectrum supplied from the MDCT unit 21, and supplies the quantized low frequency spectrum obtained as a result to the multiplexing unit 28.
 ステップS13において、低域特徴量抽出部23は、MDCT部21から供給された低域スペクトルから低域スペクトル特徴量を抽出する。 In step S13, the low frequency feature amount extraction unit 23 extracts a low frequency spectrum feature value from the low frequency spectrum supplied from the MDCT unit 21.
 例えば低域特徴量抽出部23は、低域スペクトルの各周波数ビンについて上述した式(1)の計算を行い、低域疑似振幅スペクトルを算出する。 For example, the low frequency feature quantity extraction unit 23 calculates the above-described equation (1) for each frequency bin of the low frequency spectrum, and calculates a low frequency pseudo amplitude spectrum.
 また、低域特徴量抽出部23は、得られた低域疑似振幅スペクトルを、上限周波数Fbで高域側に折り返し、低域折り返し疑似振幅スペクトルとする。このとき、例えば低域特徴量抽出部23は、折り返された低域疑似振幅スペクトルの周波数Fcより高い周波数の部分を棄損して低域折り返し疑似振幅スペクトルを生成する。 Also, the low frequency feature quantity extraction unit 23 folds the obtained low frequency pseudo amplitude spectrum to the high frequency side at the upper limit frequency Fb to obtain a low frequency aliasing pseudo amplitude spectrum. At this time, for example, the low-frequency feature quantity extraction unit 23 generates a low-frequency aliasing pseudo-amplitude spectrum by discarding a portion having a frequency higher than the frequency Fc of the aliasing low-frequency pseudo-amplitude spectrum.
 そして、低域特徴量抽出部23は、低域折り返し疑似振幅スペクトルの各周波数ビンについて上述した式(2)の計算を行い、低域スペクトル特徴量としてSFLを算出する。 Then, the low frequency feature quantity extraction unit 23 calculates the above-described formula (2) for each frequency bin of the low frequency aliasing pseudo amplitude spectrum, and calculates SFL as the low frequency spectrum feature quantity.
 低域特徴量抽出部23は、算出された低域スペクトル特徴量としてのSFLをスペクトル特性決定部25に供給するとともに、低域折り返し疑似振幅スペクトルを拡張係数算出部26に供給する。 The low frequency feature quantity extraction unit 23 supplies the SFL as the calculated low frequency spectrum feature value to the spectrum characteristic determination unit 25 and supplies the low frequency aliasing pseudo amplitude spectrum to the expansion coefficient calculation unit 26.
 ステップS14において、高域特徴量抽出部24は、MDCT部21から供給された高域スペクトルから高域スペクトル特徴量を抽出する。 In step S14, the high frequency feature quantity extraction unit 24 extracts a high frequency spectrum feature value from the high frequency spectrum supplied from the MDCT unit 21.
 例えば高域特徴量抽出部24は、高域スペクトルの各周波数ビンについて上述した式(1)を計算して高域疑似振幅スペクトルを算出するとともに、高域疑似振幅スペクトルの各周波数ビンについて式(2)の計算を行い、高域スペクトル特徴量としてSFHを算出する。 For example, the high frequency feature quantity extraction unit 24 calculates the above-described equation (1) for each frequency bin of the high frequency spectrum to calculate the high frequency pseudo amplitude spectrum, and also calculates the formula ( The calculation of 2) is performed, and SFH is calculated as a high-frequency spectrum feature amount.
 高域特徴量抽出部24は、算出された高域スペクトル特徴量としてのSFHをスペクトル特性決定部25に供給するとともに、高域疑似振幅スペクトルを拡張係数算出部26に供給する。 The high frequency feature quantity extraction unit 24 supplies SFH as the calculated high frequency spectrum feature value to the spectrum characteristic determination unit 25 and also supplies the high frequency pseudo amplitude spectrum to the expansion coefficient calculation unit 26.
 ステップS15において、スペクトル特性決定部25は、低域特徴量抽出部23から供給された低域スペクトル特徴量と、高域特徴量抽出部24から供給された高域スペクトル特徴量とに基づいて、スペクトル特性を示すスペクトル特性符号を生成する。 In step S15, the spectrum characteristic determination unit 25, based on the low frequency spectrum feature quantity supplied from the low frequency feature quantity extraction unit 23 and the high frequency spectrum feature quantity supplied from the high frequency feature quantity extraction unit 24, A spectral characteristic code indicating the spectral characteristic is generated.
 具体的には、スペクトル特性決定部25は、低域スペクトル特徴量であるSFL、および高域スペクトル特徴量であるSFHが、ともに閾値未満である場合、値が「1」であるスペクトル特性符号を生成する。 Specifically, the spectrum characteristic determination unit 25 selects a spectrum characteristic code having a value of “1” when both the SFL that is the low-frequency spectrum feature quantity and the SFH that is the high-frequency spectrum feature quantity are less than the threshold. Generate.
 これに対して、スペクトル特性決定部25は、低域スペクトル特徴量であるSFL、および高域スペクトル特徴量であるSFHのうちの少なくとも何れか一方が閾値以上である場合、値が「0」であるスペクトル特性符号を生成する。 On the other hand, when at least one of SFL, which is a low-frequency spectrum feature quantity, and SFH, which is a high-frequency spectrum feature quantity, is greater than or equal to a threshold value, the spectrum characteristic determination unit 25 has a value of “0”. A certain spectral characteristic code is generated.
 スペクトル特性決定部25は、生成されたスペクトル特性符号を拡張係数算出部26、拡張係数量子化部27、および多重化部28に供給する。 The spectrum characteristic determination unit 25 supplies the generated spectrum characteristic code to the extension coefficient calculation unit 26, the extension coefficient quantization unit 27, and the multiplexing unit 28.
 ステップS16において、拡張係数算出部26および拡張係数量子化部27は、スペクトル特性決定部25から供給されたスペクトル特性符号に基づいて、スペクトル特性が高いトーナリティを示すものであるか否かを判定する。 In step S <b> 16, the expansion coefficient calculation unit 26 and the expansion coefficient quantization unit 27 determine whether or not the spectral characteristic indicates high tonality based on the spectral characteristic code supplied from the spectral characteristic determination unit 25. .
 例えばスペクトル特性符号の値が「1」である場合、スペクトル特性が高いトーナリティを示すものであると判定される。 For example, when the value of the spectrum characteristic code is “1”, it is determined that the spectrum characteristic indicates high tonality.
 ステップS16において高いトーナリティを示すものであると判定された場合、処理はステップS17に進む。 If it is determined in step S16 that the tonality is high, the process proceeds to step S17.
 ステップS17において、拡張係数算出部26は、低域特徴量抽出部23からの低域折り返し疑似振幅スペクトル、および高域特徴量抽出部24からの高域疑似振幅スペクトルに基づいて高域全体に対して単一(1つ)の拡張係数を算出し、拡張係数量子化部27に供給する。 In step S <b> 17, the expansion coefficient calculation unit 26 applies to the entire high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extraction unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extraction unit 24. Thus, a single (one) expansion coefficient is calculated and supplied to the expansion coefficient quantization unit 27.
 すなわち、拡張係数算出部26は、上限周波数Fbから周波数Fcまでの帯域について、各周波数ビンにおける高域疑似振幅スペクトルの値の平均値を、各周波数ビンにおける低域折り返し疑似振幅スペクトルの値の平均値で除算し、拡張係数を算出する。 That is, the expansion coefficient calculation unit 26 calculates the average value of the high frequency pseudo amplitude spectrum in each frequency bin and the average value of the low frequency aliasing pseudo amplitude spectrum in each frequency bin for the band from the upper limit frequency Fb to the frequency Fc. Divide by value to calculate expansion factor.
 拡張係数が算出されると、その後、処理はステップS19へと進む。 After the expansion coefficient is calculated, the process proceeds to step S19.
 一方、ステップS16において高いトーナリティを示すものでないと判定された場合、処理はステップS18に進む。 On the other hand, if it is determined in step S16 that it does not indicate high tonality, the process proceeds to step S18.
 ステップS18において、拡張係数算出部26は、低域特徴量抽出部23からの低域折り返し疑似振幅スペクトル、および高域特徴量抽出部24からの高域疑似振幅スペクトルに基づいて、高域の分割された帯域ごとに拡張係数を算出し、拡張係数量子化部27に供給する。 In step S18, the expansion coefficient calculation unit 26 divides the high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extracting unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extracting unit 24. An expansion coefficient is calculated for each band, and is supplied to the expansion coefficient quantization unit 27.
 すなわち、例えば拡張係数算出部26は、図4に示したように高域全体を帯域B1乃至帯域B5の5つの帯域に分割し、各帯域について上述した式(4)の計算を行って、帯域ごとに拡張係数を算出する。この場合、帯域B1乃至帯域B5の帯域ごとに1つの拡張係数が算出される。 That is, for example, the expansion coefficient calculation unit 26 divides the entire high frequency band into five bands B1 to B5 as shown in FIG. 4 and performs the calculation of the above-described formula (4) for each band, The expansion coefficient is calculated for each. In this case, one extension coefficient is calculated for each of the bands B1 to B5.
 拡張係数が算出されると、その後、処理はステップS19へと進む。 After the expansion coefficient is calculated, the process proceeds to step S19.
 ステップS17またはステップS18において拡張係数が算出されると、ステップS19において、拡張係数量子化部27は、拡張係数算出部26から供給された拡張係数を量子化し、その結果得られた量子化拡張係数を多重化部28に供給する。 When the expansion coefficient is calculated in step S17 or step S18, in step S19, the expansion coefficient quantization unit 27 quantizes the expansion coefficient supplied from the expansion coefficient calculation unit 26, and the quantized expansion coefficient obtained as a result thereof. Is supplied to the multiplexing unit 28.
 ステップS20において、多重化部28は、スペクトル量子化部22からの量子化低域スペクトル、スペクトル特性決定部25からのスペクトル特性符号、および拡張係数量子化部27からの量子化拡張係数を多重化し、符号列を生成する。このとき、多重化部28は、量子化低域スペクトルや量子化拡張係数を符号化した後、符号化された量子化低域スペクトルおよび量子化拡張係数と、スペクトル特性符号とを多重化する。 In step S20, the multiplexing unit 28 multiplexes the quantized low frequency spectrum from the spectrum quantizing unit 22, the spectrum characteristic code from the spectrum characteristic determining unit 25, and the quantized expansion coefficient from the expansion coefficient quantizing unit 27. Generate a code string. At this time, the multiplexing unit 28 encodes the quantized low frequency spectrum and the quantized extension coefficient, and then multiplexes the encoded quantized low frequency spectrum and the quantized extended coefficient, and the spectrum characteristic code.
 多重化部28は、多重化により得られた符号列を出力し、符号化処理は終了する。 The multiplexing unit 28 outputs the code string obtained by multiplexing, and the encoding process ends.
 以上のようにして符号化装置11は、低域スペクトル特徴量および高域スペクトル特徴量に基づいて入力信号のスペクトル特性を決定する。そして、符号化装置11は、復号時に周波数領域で高域のレベルを調整するための拡張係数として、スペクトル特性に応じて異なる拡張係数を算出する。 As described above, the encoding device 11 determines the spectral characteristics of the input signal based on the low-frequency spectrum feature value and the high-frequency spectrum feature value. And the encoding apparatus 11 calculates a different expansion coefficient according to a spectrum characteristic as an expansion coefficient for adjusting the level of a high region in a frequency domain at the time of decoding.
 これにより、復号時に拡張係数を用いて周波数領域で高域のレベルを調整することができるとともに、スペクトル特性に応じた高域のレベル調整を実現することができる。したがって、原理遅延を増加させることなく、低リソースな環境においても高音質な音声を得ることができるようになる。 Thereby, it is possible to adjust the high frequency level in the frequency domain using the expansion coefficient at the time of decoding, and it is possible to realize the high frequency level adjustment according to the spectrum characteristics. Therefore, high-quality sound can be obtained even in a low resource environment without increasing the principle delay.
 すなわち、周波数領域で高域のレベル調整を行うことができるので、復号時の帯域拡張による時間遅延が削減され、復号側のリソースの増加も抑制される。また、スペクトル特性に応じて高域のレベル調整を行うことができるので、トーナリティの高い信号でもトーナリティの低い信号でも聴感上の音質の劣化を抑制することができ、より高音質な音声を得ることができるようになる。 That is, since the level adjustment of the high frequency can be performed in the frequency domain, the time delay due to the band expansion at the time of decoding is reduced, and an increase in resources on the decoding side is also suppressed. In addition, high-frequency level adjustment can be performed according to the spectral characteristics, so it is possible to suppress deterioration of the audible sound quality even with high tonality signals or low tonality signals, and to obtain higher-quality sound. Will be able to.
〈復号装置の構成例〉
 続いて、符号化装置11から出力された符号列を復号する復号装置について説明する。
<Configuration example of decoding device>
Next, a decoding device that decodes the code string output from the encoding device 11 will be described.
 図6は、本技術を適用した復号装置の一実施の形態の構成例を示す図である。 FIG. 6 is a diagram illustrating a configuration example of an embodiment of a decoding device to which the present technology is applied.
 図6の復号装置81は分解部91、スペクトル逆量子化部92、拡張係数逆量子化部93、拡張スペクトル生成部94、およびIMDCT部95を有している。 6 includes a decomposition unit 91, a spectrum inverse quantization unit 92, an extended coefficient inverse quantization unit 93, an extended spectrum generation unit 94, and an IMDCT unit 95.
 分解部91には、符号化装置11の多重化部28から出力された符号列が供給される。分解部91は、供給された符号列を分解して、符号列から量子化低域スペクトル、スペクトル特性符号、および量子化拡張係数を取得する。また、分解部91は、量子化低域スペクトルおよび量子化拡張係数の復号も行う。 The code sequence output from the multiplexing unit 28 of the encoding device 11 is supplied to the decomposition unit 91. The decomposition unit 91 decomposes the supplied code string and obtains a quantized low frequency spectrum, a spectrum characteristic code, and a quantization extension coefficient from the code string. The decomposition unit 91 also decodes the quantized low frequency spectrum and the quantized expansion coefficient.
 分解部91は、符号列から得られた量子化低域スペクトルをスペクトル逆量子化部92に供給し、符号列から得られたスペクトル特性符号を拡張係数逆量子化部93、および拡張スペクトル生成部94に供給する。また、分解部91は、符号列から得られた量子化拡張係数を拡張係数逆量子化部93に供給する。 The decomposing unit 91 supplies the quantized low-frequency spectrum obtained from the code string to the spectrum inverse quantizing unit 92, and the spectrum characteristic code obtained from the code string is expanded to an extension coefficient inverse quantizing unit 93 and an expanded spectrum generating unit. 94. In addition, the decomposition unit 91 supplies the quantization extension coefficient obtained from the code string to the extension coefficient inverse quantization unit 93.
 スペクトル逆量子化部92は、分解部91から供給された量子化低域スペクトルを逆量子化し、得られた低域スペクトルを拡張スペクトル生成部94およびIMDCT部95に供給する。拡張係数逆量子化部93は、分解部91から供給されたスペクトル特性符号に基づいて、分解部91から供給された量子化拡張係数を逆量子化し、得られた拡張係数を拡張スペクトル生成部94に供給する。 The spectrum inverse quantization unit 92 inversely quantizes the quantized low frequency spectrum supplied from the decomposition unit 91 and supplies the obtained low frequency spectrum to the extended spectrum generation unit 94 and the IMDCT unit 95. The expansion coefficient inverse quantization unit 93 dequantizes the quantized expansion coefficient supplied from the decomposition unit 91 based on the spectrum characteristic code supplied from the decomposition unit 91, and converts the obtained expansion coefficient into the extended spectrum generation unit 94. To supply.
 拡張スペクトル生成部94は、分解部91から供給されたスペクトル特性符号に基づいて、拡張係数逆量子化部93から供給された拡張係数、およびスペクトル逆量子化部92から供給された低域スペクトルから拡張スペクトルを生成し、IMDCT部95に供給する。 Based on the spectral characteristic code supplied from the decomposition unit 91, the extended spectrum generation unit 94 uses the extended coefficient supplied from the extended coefficient inverse quantization unit 93 and the low frequency spectrum supplied from the spectrum inverse quantization unit 92. An extended spectrum is generated and supplied to the IMDCT unit 95.
 IMDCT部95は、スペクトル逆量子化部92から供給された低域スペクトルを低域のスペクトルとし、拡張スペクトル生成部94から供給された拡張スペクトルを高域(拡張帯域)のスペクトルとして、それらの低域スペクトルと拡張スペクトルを結合(合成)する。また、IMDCT部95は、低域スペクトルと拡張スペクトルを結合して得られたスペクトルに対してIMDCTによる直交変換を行い、その結果得られた時系列信号を、復号により得られた音声信号として出力する。 The IMDCT unit 95 sets the low frequency spectrum supplied from the spectrum inverse quantization unit 92 as a low frequency spectrum and the extended spectrum supplied from the extended spectrum generation unit 94 as a high frequency (extended band) spectrum. Combine (synthesize) band spectrum and extended spectrum. Further, the IMDCT unit 95 performs IMDCT orthogonal transformation on the spectrum obtained by combining the low-frequency spectrum and the extended spectrum, and outputs the resulting time-series signal as a speech signal obtained by decoding. To do.
〈復号処理の説明〉
 次に、復号装置81の動作について説明する。
<Description of decryption processing>
Next, the operation of the decoding device 81 will be described.
 復号装置81は、符号列が供給されると復号処理を開始して符号列を復号し、音声信号を出力する。以下、図7のフローチャートを参照して、復号装置81による復号処理について説明する。 When the code sequence is supplied, the decoding device 81 starts a decoding process, decodes the code sequence, and outputs an audio signal. Hereinafter, the decoding processing by the decoding device 81 will be described with reference to the flowchart of FIG.
 ステップS51において、分解部91は供給された符号列を分解して、符号列から量子化低域スペクトル、スペクトル特性符号、および量子化拡張係数を取得する。 In step S51, the decomposition unit 91 decomposes the supplied code string, and obtains a quantized low frequency spectrum, a spectrum characteristic code, and a quantization extension coefficient from the code string.
 分解部91は、得られた量子化低域スペクトルをスペクトル逆量子化部92に供給し、スペクトル特性符号を拡張係数逆量子化部93、および拡張スペクトル生成部94に供給するとともに、量子化拡張係数を拡張係数逆量子化部93に供給する。なお、より詳細には、分解部91は量子化低域スペクトルおよび量子化拡張係数の復号を行って、復号された量子化低域スペクトルおよび量子化拡張係数を、スペクトル逆量子化部92および拡張係数逆量子化部93に供給する。 The decomposition unit 91 supplies the obtained quantized low-frequency spectrum to the spectrum inverse quantization unit 92, supplies the spectrum characteristic code to the extension coefficient inverse quantization unit 93 and the extension spectrum generation unit 94, and performs quantization extension. The coefficient is supplied to the extended coefficient inverse quantization unit 93. In more detail, the decomposing unit 91 decodes the quantized low frequency spectrum and the quantized extension coefficient, and the decoded quantized low frequency spectrum and the quantized extended coefficient are converted into the spectrum inverse quantizing unit 92 and the extended The coefficient is supplied to the coefficient inverse quantization unit 93.
 ステップS52において、スペクトル逆量子化部92は、分解部91から供給された量子化低域スペクトルを逆量子化し、得られた低域スペクトルを拡張スペクトル生成部94およびIMDCT部95に供給する。 In step S52, the spectrum inverse quantization unit 92 inversely quantizes the quantized low frequency spectrum supplied from the decomposition unit 91, and supplies the obtained low frequency spectrum to the extended spectrum generation unit 94 and the IMDCT unit 95.
 ステップS53において、拡張係数逆量子化部93および拡張スペクトル生成部94は、分解部91から供給されたスペクトル特性符号に基づいて、スペクトル特性が高いトーナリティを示すものであるか否かを判定する。 In step S53, the extended coefficient inverse quantization unit 93 and the extended spectrum generation unit 94 determine whether or not the spectral characteristic shows high tonality based on the spectral characteristic code supplied from the decomposition unit 91.
 例えばスペクトル特性符号の値が「1」である場合、スペクトル特性が高いトーナリティを示すものであると判定される。この場合、符号列には、高域全体に対して算出された1つ(単一)の拡張係数を得るための量子化拡張係数が含まれているので、分解部91から拡張係数逆量子化部93には、1つの量子化拡張係数が供給される。 For example, when the value of the spectrum characteristic code is “1”, it is determined that the spectrum characteristic indicates high tonality. In this case, since the code string includes a quantization extension coefficient for obtaining one (single) extension coefficient calculated for the entire high band, the expansion coefficient inverse quantization is performed from the decomposition unit 91. The unit 93 is supplied with one quantization expansion coefficient.
 逆にスペクトル特性符号の値が「0」である場合、スペクトル特性が高いトーナリティを示すものでない、つまり高いノイズ性を示すものであると判定される。この場合、符号列には、高域を構成する複数の帯域ごとに算出された各拡張係数を得るための量子化拡張係数が含まれているので、分解部91から拡張係数逆量子化部93には、高域の分割された帯域の数だけ量子化拡張係数が供給される。 Conversely, when the value of the spectral characteristic code is “0”, it is determined that the spectral characteristic does not indicate high tonality, that is, indicates high noise characteristics. In this case, since the code string includes quantization extension coefficients for obtaining each extension coefficient calculated for each of a plurality of bands constituting the high band, the decomposition unit 91 to the extension coefficient inverse quantization unit 93 Are supplied with quantization expansion coefficients corresponding to the number of high-frequency divided bands.
 ステップS53において高いトーナリティを示すものであると判定された場合、ステップS54において、拡張係数逆量子化部93は、分解部91から供給された単一の量子化拡張係数を逆量子化し、得られた拡張係数を拡張スペクトル生成部94に供給する。 If it is determined in step S53 that the tonality is high, in step S54, the expansion coefficient inverse quantization unit 93 inversely quantizes the single quantization expansion coefficient supplied from the decomposition unit 91. The extended coefficient is supplied to the extended spectrum generation unit 94.
 ステップS55において、拡張スペクトル生成部94は、拡張係数逆量子化部93から供給された単一の拡張係数、およびスペクトル逆量子化部92から供給された低域スペクトルに基づいて拡張スペクトルを生成し、IMDCT部95に供給する。 In step S55, the extended spectrum generation unit 94 generates an extended spectrum based on the single extension coefficient supplied from the extension coefficient inverse quantization unit 93 and the low frequency spectrum supplied from the spectrum inverse quantization unit 92. , Supplied to the IMDCT section 95.
 具体的には、拡張スペクトル生成部94は、図3を参照して説明した例と同様にして、低域スペクトルを上限周波数Fbを境界として高域側に折り返し、その結果得られた折り返しスペクトルを、拡張スペクトルを得るための種スペクトルとする。 Specifically, the extended spectrum generation unit 94 folds the low band spectrum to the high band side with the upper limit frequency Fb as the boundary in the same manner as the example described with reference to FIG. , A seed spectrum for obtaining an extended spectrum.
 拡張スペクトル生成部94は、得られた種スペクトル全体、つまり各周波数ビンにおける種スペクトルの値に単一の拡張係数を乗算し、拡張スペクトルとする。すなわち、種スペクトルのレベルが、拡張係数によって符号化前の元の高域スペクトルのレベルに調整され、拡張スペクトルとされる。 The extended spectrum generation unit 94 multiplies the entire obtained seed spectrum, that is, the value of the seed spectrum in each frequency bin by a single extension coefficient to obtain an extended spectrum. That is, the level of the seed spectrum is adjusted to the level of the original high-frequency spectrum before encoding by the extension coefficient, and is set as the extended spectrum.
 このようにして得られた拡張スペクトルは、復号により得られた低域スペクトルと拡張係数とから推定された、元の入力信号の高域スペクトルである。 The extended spectrum obtained in this way is the high-frequency spectrum of the original input signal estimated from the low-frequency spectrum obtained by decoding and the expansion coefficient.
 拡張スペクトルが得られると、その後、処理はステップS58へと進む。 When the extended spectrum is obtained, the process proceeds to step S58.
 一方、ステップS53において、スペクトル特性が高いトーナリティを示すものでない、つまり高いノイズ性を示すものであると判定された場合、処理はステップS56へと進む。 On the other hand, if it is determined in step S53 that the spectral characteristic does not indicate high tonality, that is, indicates high noise, the process proceeds to step S56.
 ステップS56において、拡張係数逆量子化部93は、分解部91から供給された高域を構成する複数の帯域ごとの量子化拡張係数を逆量子化し、得られた拡張係数を拡張スペクトル生成部94に供給する。これにより、例えば図4に示した帯域B1乃至帯域B5の各帯域(領域)の拡張係数が得られる。 In step S56, the extension coefficient inverse quantization unit 93 inversely quantizes the quantization extension coefficient for each of a plurality of bands constituting the high frequency supplied from the decomposition unit 91, and uses the obtained extension coefficient as the extension spectrum generation unit 94. To supply. Thereby, for example, the expansion coefficient of each band (area) of the bands B1 to B5 shown in FIG. 4 is obtained.
 ステップS57において、拡張スペクトル生成部94は、拡張係数逆量子化部93から供給された各帯域の拡張係数、およびスペクトル逆量子化部92から供給された低域スペクトルに基づいて拡張スペクトルを生成し、IMDCT部95に供給する。 In step S57, the extended spectrum generation unit 94 generates an extended spectrum based on the extension coefficient of each band supplied from the extension coefficient inverse quantization unit 93 and the low frequency spectrum supplied from the spectrum inverse quantization unit 92. , Supplied to the IMDCT section 95.
 具体的には、拡張スペクトル生成部94は、ステップS55における場合と同様の処理を行って種スペクトルを生成し、得られた種スペクトルの各帯域(領域)に対して、それらの帯域の拡張係数を乗算し、拡張スペクトルとする。 Specifically, the extended spectrum generation unit 94 generates a seed spectrum by performing the same process as in step S55, and for each band (region) of the obtained seed spectrum, the extension coefficient of those bands To obtain an extended spectrum.
 例えば図4に示したように高域が帯域B1乃至帯域B5の5つの帯域に分割される場合、種スペクトルの帯域B1の部分、より詳細には帯域B1内の各周波数ビンにおける種スペクトルの値に帯域B1の拡張係数が乗算され、拡張スペクトルの帯域B1の部分が生成される。同様にして他の帯域B2乃至帯域B5についても、種スペクトルのそれらの帯域に各帯域の拡張係数が乗算され、拡張スペクトルの各帯域部分が生成される。 For example, when the high band is divided into five bands B1 to B5 as shown in FIG. 4, the value of the seed spectrum in the band B1 portion of the seed spectrum, more specifically, each frequency bin in the band B1. Is multiplied by the expansion coefficient of band B1 to generate the band B1 portion of the extended spectrum. Similarly, for the other bands B2 to B5, those bands of the seed spectrum are multiplied by the extension coefficient of each band, and each band portion of the extended spectrum is generated.
 拡張スペクトルが得られると、その後、処理はステップS58へと進む。 When the extended spectrum is obtained, the process proceeds to step S58.
 なお、ステップS55およびステップS57では、低域スペクトルを高域側に折り返して種スペクトルとする例について説明したが、これに限らず種スペクトルはどのようにして生成されてもよい。例えば低域スペクトルの一部の周波数帯域の部分を複製(コピー)して高域に貼り付けることで得られたスペクトルを種スペクトルとしてもよい。 In step S55 and step S57, the example in which the low-frequency spectrum is turned back to the high-frequency side is used as the seed spectrum. However, the seed spectrum may be generated in any way. For example, a spectrum obtained by duplicating (copying) a part of a part of the frequency band of the low-frequency spectrum and pasting it on the high frequency may be used as the seed spectrum.
 ステップS55またはステップS57において拡張スペクトルが生成されると、ステップS58において、IMDCT部95はスペクトル逆量子化部92から供給された低域スペクトル、および拡張スペクトル生成部94から供給された拡張スペクトルに基づいて時系列信号を生成する。 When an extended spectrum is generated in step S55 or step S57, in step S58, the IMDCT unit 95 is based on the low frequency spectrum supplied from the spectrum inverse quantization unit 92 and the extended spectrum supplied from the extended spectrum generation unit 94. To generate a time series signal.
 すなわち、IMDCT部95は低域スペクトルと拡張スペクトルを結合(合成)して、低域と高域(拡張帯域)の全帯域成分を有するスペクトルを生成し、さらに結合により得られたスペクトルに対してIMDCTを行って時系列信号を得る。これにより、帯域拡張により高域成分が付加された時系列信号が得られる。 That is, the IMDCT unit 95 combines (synthesizes) the low-frequency spectrum and the extended spectrum to generate a spectrum having all the band components of the low-frequency and high-frequency (extended bandwidth). IMDCT is performed to obtain a time series signal. Thereby, a time series signal to which a high frequency component is added by band expansion is obtained.
 IMDCT部95は、このようにして得られた時系列信号を、復号により得られた音声信号として出力し、復号処理は終了する。 The IMDCT section 95 outputs the time-series signal obtained in this way as an audio signal obtained by decoding, and the decoding process ends.
 以上のようにして復号装置81は、スペクトル特性に応じた拡張係数を復号および逆量子化により得て、得られた拡張係数と、低域スペクトルを高域側に折り返して得られた種スペクトルとから拡張スペクトルを生成する。 As described above, the decoding device 81 obtains the extension coefficient corresponding to the spectrum characteristic by decoding and inverse quantization, the obtained extension coefficient, and the seed spectrum obtained by folding the low-frequency spectrum to the high-frequency side. Generate an extended spectrum from
 このように、スペクトル特性に応じた拡張係数を用いて、高域成分である種スペクトルのレベルを調整し、拡張スペクトルとすることで、周波数領域で高域のレベルを調整することができるとともに、スペクトル特性に応じた高域のレベル調整を実現することができる。 In this way, by using the expansion coefficient according to the spectral characteristics, the level of the seed spectrum that is a high frequency component is adjusted, and by setting the expanded spectrum, the high frequency level can be adjusted in the frequency domain, High frequency level adjustment according to the spectrum characteristics can be realized.
 これにより、原理遅延を増加させることなく、低リソースな環境でも高音質な音声を得ることができる。すなわち、周波数領域でレベル調整を行うことで、復号時の帯域拡張による遅延時間を削減し、リソースの増加も抑制することができる。また、トーナリティの高い信号でもトーナリティの低い信号でも帯域拡張による聴感上の音質の劣化を抑制し、より高音質な音声を得ることができる。 This makes it possible to obtain high-quality sound even in a low-resource environment without increasing the principle delay. That is, by performing level adjustment in the frequency domain, it is possible to reduce delay time due to band expansion during decoding and to suppress an increase in resources. In addition, it is possible to suppress the deterioration of sound quality due to the band expansion for a signal having a high tonality or a signal having a low tonality, and to obtain a higher quality sound.
〈拡張スペクトルの生成について〉
 ここで、復号装置81の拡張スペクトル生成部94による拡張スペクトルの生成について、より詳細に説明する。
<About generation of extended spectrum>
Here, generation of an extended spectrum by the extended spectrum generation unit 94 of the decoding device 81 will be described in more detail.
 上述したように拡張スペクトル生成部94は、スペクトル特性符号に基づいて、符号化前の元の信号がトーナリティの高い信号であるか、またはノイズ性の高い通常の信号であるかを区別して拡張スペクトルを生成している。 As described above, the extended spectrum generation unit 94 distinguishes whether the original signal before encoding is a signal with high tonality or a normal signal with high noise characteristics based on the spectrum characteristic code. Is generated.
 例えば図8に示すように、トーナリティの高い信号とノイズ性の高い通常の信号とは、それらの信号のスペクトル概形が異なる。なお、図8において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 For example, as shown in FIG. 8, a signal having a high tonality and a normal signal having a high noise characteristic have different spectral outlines. In FIG. 8, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 図8では、曲線C21はノイズ性の高い信号、つまり通常の信号のスペクトルを表しており、曲線C22はトーナリティの高い信号のスペクトルを表している。 In FIG. 8, a curve C21 represents a high noise signal, that is, a normal signal spectrum, and a curve C22 represents a high tonal signal spectrum.
 曲線C21で表されるノイズ性の高い信号は、全周波数帯域においてレベルが突出した部分がなく、スペクトルの波形はなだらかな山のような形状となっている。つまり、ノイズ性の高い信号には、エネルギが集中している部分がない。 The highly noisy signal represented by curve C21 has no protruding portion in the entire frequency band, and the spectrum waveform has a gentle mountain shape. In other words, a signal with high noise characteristics does not have a portion where energy is concentrated.
 これに対して、曲線C22で表されるトーナリティの高い信号は、特定の周波数にエネルギが集中し、その部分の波形が鋭く尖った山のようになっている。つまり、トーナリティの高い信号のスペクトルの波形は、エネルギが集中している周波数の部分が突出しており、なだらかな波形とはなっていない。 On the other hand, the signal with high tonality represented by the curve C22 has energy concentrated on a specific frequency, and the waveform of the portion is like a sharply pointed mountain. That is, the waveform of the spectrum of the signal with high tonality is not a gentle waveform because the frequency portion where energy is concentrated protrudes.
 また、拡張スペクトルを生成する際には、低域スペクトルが上限周波数Fbで折り返されたものや、低域スペクトルを部分的に複製して高域に貼り付けたものなど、低域スペクトルから得られたスペクトルが種スペクトルとして用いられる。そして、この種スペクトルが拡張係数によりレベル調整、つまり振幅調整されて拡張スペクトルとされる。 In addition, when generating an extended spectrum, it can be obtained from the low-frequency spectrum, such as the low-frequency spectrum folded at the upper limit frequency Fb, or the low-frequency spectrum partially copied and pasted at the high frequency. The spectrum is used as the seed spectrum. Then, this kind of spectrum is level-adjusted by the expansion coefficient, that is, the amplitude is adjusted to obtain an expanded spectrum.
 ここで、ノイズ性の高い通常の信号では、各スペクトルで近接するもの同士の位相関係は聴感上においてそれほど重要ではなく、振幅レベルが重要である。そのため、種スペクトルのレベル調整にあたっては、種スペクトルのレベル(振幅)をなるべく符号化前の元の信号の高域スペクトルのレベルに近づけるために、細かな単位でレベル調整を行うことが望ましい。 Here, in a normal signal with high noise characteristics, the phase relationship between adjacent ones in each spectrum is not so important for hearing, and the amplitude level is important. Therefore, when adjusting the level of the seed spectrum, it is desirable to perform level adjustment in fine units so that the level (amplitude) of the seed spectrum is as close as possible to the level of the high-frequency spectrum of the original signal before encoding.
 すなわち、例えば図9に示すように符号化時に高域を4つの帯域に分割し、帯域ごとに拡張係数を算出したとする。なお、図9において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 That is, for example, as shown in FIG. 9, it is assumed that the high frequency band is divided into four bands at the time of encoding, and the expansion coefficient is calculated for each band. In FIG. 9, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 この例では、高域スペクトルの周波数帯域、つまり高域である上限周波数Fbから周波数Fcまでの周波数帯域が帯域B11乃至帯域B14の4つの帯域(領域)に分割されている。そして、分割により得られた各帯域の幅は、周波数Fc側にある帯域ほど広くなっている。 In this example, the frequency band of the high-frequency spectrum, that is, the frequency band from the upper limit frequency Fb to the frequency Fc, which is a high frequency, is divided into four bands (areas) B11 to B14. The width of each band obtained by the division is wider as the band is on the frequency Fc side.
 このような場合、入力信号の符号化では、帯域B11乃至帯域B14の各帯域について、それらの帯域における高域疑似振幅スペクトルの平均値が算出される。この例では、直線L11乃至直線L14のそれぞれが、帯域B11乃至帯域B14のそれぞれにおける高域疑似振幅スペクトルの平均値、つまり高域スペクトルの平均振幅を表している。 In such a case, in the encoding of the input signal, the average value of the high-frequency pseudo-amplitude spectrum in each of the bands B11 to B14 is calculated. In this example, each of the straight lines L11 to L14 represents the average value of the high frequency pseudo-amplitude spectrum in each of the bands B11 to B14, that is, the average amplitude of the high frequency spectrum.
 また、帯域ごとに求められた高域疑似振幅スペクトルの平均値が、同じ帯域の低域折り返し疑似振幅スペクトルの平均値で除算されて得られた値が拡張係数として符号列に格納され、復号装置81へと伝送される。 Further, a value obtained by dividing the average value of the high frequency pseudo amplitude spectrum obtained for each band by the average value of the low frequency aliasing pseudo amplitude spectrum of the same band is stored in the code string as an extension coefficient, and is decoded. 81.
 すると、復号装置81では図10に示すように低域スペクトルから得られた種スペクトルが拡張係数によりレベル調整される。なお、図10において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。また、図10において、図9における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 Then, in the decoding device 81, as shown in FIG. 10, the level of the seed spectrum obtained from the low frequency spectrum is adjusted by the extension coefficient. In FIG. 10, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency. In FIG. 10, the same reference numerals are given to the portions corresponding to those in FIG. 9, and the description thereof will be omitted as appropriate.
 図10では、曲線C31は符号列の復号により得られた低域スペクトルを表しており、曲線C32は低域スペクトルから得られた種スペクトルを表している。 In FIG. 10, a curve C31 represents a low-frequency spectrum obtained by decoding a code string, and a curve C32 represents a seed spectrum obtained from the low-frequency spectrum.
 この例では、曲線C31で表される低域スペクトルが上限周波数Fbで高域側に折り返されて、曲線C32で表される種スペクトルとされている。 In this example, the low-frequency spectrum represented by the curve C31 is folded back to the high-frequency side at the upper limit frequency Fb to obtain the seed spectrum represented by the curve C32.
 このような種スペクトルの各帯域B11乃至帯域B14のそれぞれに対して、それらの帯域ごとに算出された拡張係数のそれぞれが乗算される。これにより、種スペクトルの各帯域のレベル、より詳細には各帯域の平均振幅が図中、矢印に示されるように元の信号の高域スペクトルの平均振幅に近づくように、種スペクトルのレベルが帯域B11乃至帯域B14の各帯域で調整される。 Each of the bands B11 to B14 of such a seed spectrum is multiplied by each of the expansion coefficients calculated for each band. As a result, the level of the seed spectrum is adjusted so that the average amplitude of each band of the seed spectrum, more specifically, the average amplitude of each band approaches the average amplitude of the high-frequency spectrum of the original signal as indicated by an arrow in the figure. Adjustment is made in each of the bands B11 to B14.
 ところが、低域スペクトルがトーナリティの高い信号である場合、種スペクトルに対して帯域ごとに異なる拡張係数を乗算すると、拡張スペクトルの各帯域のレベル、つまり平均振幅は符号化前の元の高域スペクトルの平均振幅に近づくが、スペクトルの位相関係が各帯域で大幅に崩れてしまう。 However, when the low-frequency spectrum is a signal with high tonality, if the seed spectrum is multiplied by a different expansion coefficient for each band, the level of each band of the expanded spectrum, that is, the average amplitude, is the original high-frequency spectrum before encoding. However, the phase relationship of the spectrum is greatly broken in each band.
 そうすると、例えば図11に示すように拡張スペクトルのトーナリティが損なわれてしまう。なお、図11において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 Then, for example, the tonality of the extended spectrum is impaired as shown in FIG. In FIG. 11, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 この例では、曲線C41は符号化対象となった入力信号のMDCTスペクトルを表しており、曲線C42は、符号化対象となった入力信号の復号時に生成された低域スペクトルと拡張スペクトルを結合して得られたスペクトルを表している。したがって、この例では曲線C42で表されるスペクトルのうち、周波数Dcから上限周波数Fbまでの部分が低域スペクトルであり、上限周波数Fbから周波数Fcまでの部分が拡張スペクトルである。 In this example, curve C41 represents the MDCT spectrum of the input signal to be encoded, and curve C42 combines the low-frequency spectrum and extended spectrum generated when decoding the input signal to be encoded. Represents the spectrum obtained. Therefore, in this example, in the spectrum represented by the curve C42, a portion from the frequency Dc to the upper limit frequency Fb is a low-frequency spectrum, and a portion from the upper limit frequency Fb to the frequency Fc is an extended spectrum.
 この例では、元の入力信号は低域も高域もトーナリティの高い信号となっている。このような入力信号の復号時に、高域の帯域ごとに異なる拡張係数により種スペクトルのレベル調整を行うと、曲線C42に示されるようにスペクトルの位相関係が大幅に崩れ、拡張帯域のトーナリティが損なわれてしまう。 In this example, the original input signal is a signal with high tonality in both low and high frequencies. When decoding such an input signal, if the level of the seed spectrum is adjusted with a different expansion coefficient for each high-frequency band, the phase relationship of the spectrum is significantly disrupted, as shown by the curve C42, and the tonality of the extended band is impaired. It will be.
 曲線C42で表されるスペクトルでは高域部分、つまり拡張スペクトルの波形が崩れてしまっており、元のMDCTスペクトルが有していたトーナリティが損なわれてしまっている。特に高域の分割された帯域同士の境界部分において波形が崩れやすく、トーナリティが損なわれやすい。 In the spectrum represented by the curve C42, the waveform of the high frequency region, that is, the extended spectrum is broken, and the tonality that the original MDCT spectrum had has been damaged. In particular, the waveform tends to collapse at the boundary between the high-frequency divided bands, and the tonality is easily lost.
 低域スペクトルを折り返して得られる種スペクトルは、そのままの状態、つまり拡張係数によるレベル調整を行う前の状態ではスペクトルの位相関係は保たれているため、トーナリティも保たれている。 The seed spectrum obtained by folding the low-frequency spectrum maintains the phase relationship of the spectrum in the state as it is, that is, before the level adjustment by the expansion coefficient, and thus the tonality is also maintained.
 しかし、種スペクトルのレベル(振幅)を調整しないと、拡張スペクトルに元の入力信号の高域スペクトルの振幅レベルを反映させることができない。そうすると高域、つまり拡張帯域の部分の音量が元の高域の音量とは異なるものとなってしまうため、適切な帯域拡張を実現できなくなってしまう。換言すれば、より高音質な音声を得ることができなくなってしまう。 However, unless the level (amplitude) of the seed spectrum is adjusted, the amplitude level of the high frequency spectrum of the original input signal cannot be reflected in the extended spectrum. As a result, the volume of the high frequency band, that is, the expansion band is different from that of the original high frequency band, so that appropriate band expansion cannot be realized. In other words, it becomes impossible to obtain higher quality sound.
 そこで、本技術ではトーナリティの高い信号に対しては、種スペクトルのレベル調整を最小限の単位で行うことで、拡張スペクトルにおけるトーナリティの保持と振幅レベルの反映の両方を実現している。 Therefore, in this technology, both the retention of the tonality in the extended spectrum and the reflection of the amplitude level are realized by adjusting the level of the seed spectrum in the minimum unit for a signal with high tonality.
 具体的には、符号化時に拡張係数算出部26は、高域(拡張帯域)全体における高域疑似振幅スペクトルの平均値を、高域全体における低域折り返し疑似振幅スペクトルの平均値で除算して、拡張帯域に対して単一の拡張係数を算出する。 Specifically, at the time of encoding, the expansion coefficient calculation unit 26 divides the average value of the high frequency pseudo amplitude spectrum in the entire high frequency band (extension frequency band) by the average value of the low frequency aliasing pseudo amplitude spectrum in the entire high frequency band. Then, a single expansion coefficient is calculated for the expansion band.
 また、復号時には、拡張スペクトル生成部94は、種スペクトル全体に対して単一の拡張係数を乗算して拡張スペクトルとする。つまり、拡張帯域(高域)全体を単位として、種スペクトルのレベル調整が行われ、拡張スペクトルとされる。 Further, at the time of decoding, the extended spectrum generation unit 94 multiplies the entire seed spectrum by a single extension coefficient to obtain an extended spectrum. That is, the level of the seed spectrum is adjusted by using the entire extended band (high band) as a unit to obtain an extended spectrum.
 このように拡張帯域を単位としてレベル調整を行うことで、例えば図12に示すように入力信号のトーナリティを保ちつつ、拡張スペクトルの高域の全体的な振幅レベルも元の入力信号の高域の振幅レベルと近いものとすることができる。なお、図12において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 Thus, by performing level adjustment in units of the extension band, for example, as shown in FIG. 12, while maintaining the tonality of the input signal, the overall amplitude level of the high frequency range of the extended spectrum is also the high frequency of the original input signal. It can be close to the amplitude level. In FIG. 12, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 図12では、曲線C51乃至曲線C53は、それぞれ元の入力信号のMDCTスペクトル、復号時の逆量子化により得られた低域スペクトル、および種スペクトルを表している。 In FIG. 12, curves C51 to C53 represent the MDCT spectrum of the original input signal, the low-frequency spectrum obtained by inverse quantization at the time of decoding, and the seed spectrum, respectively.
 この例では、曲線C51で表されるMDCTスペクトルは、低域部分および高域部分、すなわち低域スペクトルおよび高域スペクトルには、それぞれ特定の周波数にエネルギが集中している部分があり、トーナリティの高い信号となっている。また、曲線C51で表されるMDCTスペクトルでは、低域スペクトルの平均振幅が高域スペクトルの平均振幅よりも大きくなっている。 In this example, the MDCT spectrum represented by the curve C51 has a low-frequency part and a high-frequency part, that is, a low-frequency spectrum and a high-frequency spectrum, each of which has energy concentrated at a specific frequency. The signal is high. In the MDCT spectrum represented by the curve C51, the average amplitude of the low frequency spectrum is larger than the average amplitude of the high frequency spectrum.
 このようなMDCTスペクトルの高域スペクトルに対して、符号化時には、その高域スペクトルの帯域全体について、高域疑似振幅スペクトルの平均値が求められて単一の拡張係数が算出される。図12では、直線L21は高域(拡張帯域)における高域疑似振幅スペクトルの平均値、つまり高域スペクトルの平均振幅を表している。 For such a high-frequency spectrum of the MDCT spectrum, at the time of encoding, an average value of the high-frequency pseudo-amplitude spectrum is obtained for the entire band of the high-frequency spectrum, and a single expansion coefficient is calculated. In FIG. 12, a straight line L21 represents the average value of the high-frequency pseudo-amplitude spectrum in the high frequency band (extended band), that is, the average amplitude of the high frequency spectrum.
 また、復号時には曲線C52で表される低域スペクトルが折り返されて、曲線C53で表される種スペクトルとされ、この種スペクトルが拡張係数によって、図中の矢印により表されるようにレベル調整されて、拡張スペクトルとされる。 At the time of decoding, the low-frequency spectrum represented by the curve C52 is folded back to be a seed spectrum represented by the curve C53, and the level of the seed spectrum is adjusted by the expansion coefficient as represented by the arrow in the figure. And an extended spectrum.
 その際、単一の拡張係数によって拡張スペクトルの高域全体の平均振幅が直線L21で表される高域疑似振幅スペクトルの平均値に近づくようにされる。これにより、種スペクトルの各周波数のレベルが同じだけ調整されるので位相関係を崩すことなく、つまりトーナリティを保ちつつ、適切に振幅レベルも調整することができる。その結果、より高音質な音声を得ることができる。 At that time, the average amplitude of the entire high frequency range of the extended spectrum is made closer to the average value of the high frequency pseudo amplitude spectrum represented by the straight line L21 by a single expansion coefficient. Thereby, since the level of each frequency of the seed spectrum is adjusted by the same amount, the amplitude level can be adjusted appropriately without breaking the phase relationship, that is, while maintaining the tonality. As a result, higher quality sound can be obtained.
 また、拡張係数が単一であれば、符号化装置11から出力される符号列に格納される、帯域拡張に必要な付加情報の情報量も削減することができるので、その分だけ低域スペクトルの量子化に情報量を割り当てることが可能となり、全体的な音質の向上を見込むことができる。 Further, if the extension coefficient is single, the amount of additional information necessary for band extension stored in the code string output from the encoding device 11 can be reduced, so that the low-frequency spectrum is correspondingly reduced. It is possible to allocate the amount of information to the quantization of the sound, and to improve the overall sound quality.
〈第2の実施の形態〉
〈ランダムノイズによる拡張スペクトルの生成について〉
 ところで、入力信号の低域のトーナリティが高い場合には、通常、その高域のトーナリティも高いことが多い。そのため、以上において説明した符号化処理では、低域スペクトル特徴量と高域スペクトル特徴量の両方が閾値未満である場合に、符号化対象の入力信号はトーナリティが高いというスペクトル特性を有しているとされていた。
<Second Embodiment>
<Generation of extended spectrum by random noise>
By the way, when the low-frequency tonality of the input signal is high, the high-frequency tonality is usually high. Therefore, in the encoding process described above, when both the low-frequency spectrum feature value and the high-frequency spectrum feature value are less than the threshold value, the input signal to be encoded has a spectral characteristic that the tonality is high. It was said.
 しかし、頻度は多くないが、例えば図13に示すように低域スペクトルのトーナリティが高く、高域スペクトルのトーナリティが低いというスペクトル特性を有する入力信号もある。なお、図13において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 However, although the frequency is not high, for example, as shown in FIG. 13, there is an input signal having a spectral characteristic that the tonality of the low-frequency spectrum is high and the tonality of the high-frequency spectrum is low. In FIG. 13, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 図13では、曲線C61は、符号化対象とされる入力信号のMDCTスペクトルを表している。特に、このMDCTスペクトルにおいて、周波数Dcから上限周波数Fbまでの部分が低域スペクトルであり、上限周波数Fbから周波数Fcまでの部分が高域スペクトルである。 In FIG. 13, curve C61 represents the MDCT spectrum of the input signal to be encoded. In particular, in this MDCT spectrum, a portion from the frequency Dc to the upper limit frequency Fb is a low frequency spectrum, and a portion from the upper limit frequency Fb to the frequency Fc is a high frequency spectrum.
 例えば低域スペクトルには、特定の周波数にエネルギが集中している部分があり、トーナリティの高い信号となっている。これに対して、高域スペクトルには、特定の周波数にエネルギが集中している部分がなくトーナリティの低い信号、つまりノイズ性の高い信号となっている。 For example, in the low frequency spectrum, there is a portion where energy is concentrated at a specific frequency, which is a signal with high tonality. On the other hand, the high frequency spectrum does not have a portion where energy is concentrated at a specific frequency and is a signal with low tonality, that is, a signal with high noise.
 このように低域のトーナリティは高いが、高域のトーナリティは低い入力信号を符号化し、復号時に帯域拡張を行うとする。そのような場合、低域スペクトルの折り返しや部分的な複製により種スペクトルを生成し、その種スペクトルから拡張スペクトルを生成すると、例えば図14に示すように、拡張スペクトルに本来のノイズ性ではなくトーナリティが強く現れてしまうことがある。なお、図14において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 Suppose that low-frequency tonality is high in this way, but high-frequency tonality encodes a low input signal and performs band expansion at the time of decoding. In such a case, when a seed spectrum is generated by folding or partial duplication of a low-frequency spectrum and an extended spectrum is generated from the seed spectrum, for example, as shown in FIG. May appear strongly. In FIG. 14, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 この例では、曲線C71は量子化低域スペクトルを逆量子化して得られた低域スペクトルを表しており、曲線C72は拡張スペクトルを表している。 In this example, a curve C71 represents a low-frequency spectrum obtained by inverse quantization of the quantized low-frequency spectrum, and a curve C72 represents an extended spectrum.
 この例では、元の時系列信号の高域スペクトルはトーナリティが低いものであったが、低域スペクトルのトーナリティが高いため、低域スペクトルの折り返しと、拡張係数によるレベル調整により得られた拡張スペクトルはトーナリティが高いものとなっている。すなわち、帯域拡張によって高域に本来の信号が有する特性と異なる特性が現れている。 In this example, the high-frequency spectrum of the original time-series signal has a low tonality, but the low-frequency spectrum has a high tonality, so the extended spectrum obtained by folding the low-frequency spectrum and adjusting the level using the expansion coefficient Has a high tonality. That is, a characteristic different from the characteristic of the original signal appears in the high band due to the band expansion.
 このように高域に本来有していなかった高いトーナリティが現れると、復号処理により得られる時系列信号(音声信号)に聴感上、金属的な音が混じってしまうなど、違和感を生じさせる原因となってしまう。 When high tonalities that were not originally possessed in the high frequency band appear in this way, the time series signal (audio signal) obtained by the decoding process may cause a sense of incongruity, for example, metal sounds may be mixed. turn into.
 そこで、低域スペクトルのトーナリティが高く、高域スペクトルのトーナリティが低い場合には、低域スペクトルの折り返しを種スペクトルとして使用せずに、例えば図15に示すようにランダムノイズを用いて拡張スペクトルを生成するようにしてもよい。なお、図15において縦軸はスペクトルの値、つまりレベルを示しており、横軸は周波数を示している。 Therefore, when the low-frequency spectrum tonality is high and the high-frequency spectrum tonality is low, the extended spectrum is obtained using random noise as shown in FIG. 15, for example, without using the low-frequency spectrum aliasing as a seed spectrum. You may make it produce | generate. In FIG. 15, the vertical axis indicates the spectrum value, that is, the level, and the horizontal axis indicates the frequency.
 図15では、曲線C81乃至曲線C83は、それぞれMDCTスペクトル、量子化低域スペクトルを逆量子化して得られた低域スペクトル、および拡張スペクトルを表している。 In FIG. 15, curves C81 to C83 represent the MDCT spectrum, the low-frequency spectrum obtained by dequantizing the quantized low-frequency spectrum, and the extended spectrum, respectively.
 この例ではMDCTスペクトルの高域が帯域B31乃至帯域B33の3つの帯域に分割されており、周波数の高い帯域ほど帯域幅が広くなっている。高域が帯域B31乃至帯域B33の各帯域に分割されると、符号化時には、それらの帯域ごとに、帯域の包絡を示す包絡情報として包絡係数が算出される。例えば包絡係数は、算出対象の帯域における各周波数ビンの高域疑似振幅スペクトルの平均値とされる。 In this example, the high region of the MDCT spectrum is divided into three bands, band B31 to band B33, and the higher the frequency, the wider the bandwidth. When the high frequency band is divided into bands B31 to B33, an envelope coefficient is calculated as envelope information indicating the envelope of the band for each band during encoding. For example, the envelope coefficient is an average value of the high-frequency pseudo-amplitude spectrum of each frequency bin in the calculation target band.
 図15では、直線L31乃至直線L33のそれぞれが、帯域B31乃至帯域B33のそれぞれについて算出された包絡係数を示している。 In FIG. 15, each of the straight lines L31 to L33 indicates the envelope coefficient calculated for each of the bands B31 to B33.
 包絡係数は、拡張スペクトル生成時にノイズ信号としてのランダムノイズのレベル調整を行うための拡張係数情報であるが、ここでは低域折り返し疑似振幅スペクトルおよび高域疑似振幅スペクトルから算出される拡張係数と区別するため、包絡係数と称することとする。なお、包絡係数の算出時における高域の分割数は、拡張係数算出時の高域の分割数と同じであってもよいし異なる数であってもよい。 The envelope coefficient is expansion coefficient information for adjusting the level of random noise as a noise signal when generating an extended spectrum. Here, the envelope coefficient is distinguished from the expansion coefficient calculated from the low-frequency aliasing pseudo-amplitude spectrum and the high-frequency pseudo-amplitude spectrum. Therefore, it will be referred to as an envelope coefficient. It should be noted that the number of high frequency divisions at the time of calculating the envelope coefficient may be the same as or different from the number of high frequency divisions at the time of calculating the expansion coefficient.
 包絡係数が算出されると、その包絡係数が量子化および符号化されて、量子化低域スペクトルやスペクトル特性符号と多重化され、符号列が生成される。 When the envelope coefficient is calculated, the envelope coefficient is quantized and encoded, and multiplexed with the quantized low frequency spectrum and spectrum characteristic code to generate a code string.
 また、符号列の供給を受けた復号側では、符号列から取得された包絡係数と、ランダムノイズとが用いられて拡張スペクトルが生成される。 Also, on the decoding side that receives the supply of the code string, an extended spectrum is generated using the envelope coefficient acquired from the code string and random noise.
 すなわち、復号時には拡張帯域である帯域B31乃至帯域B33の各周波数ビンごとに-1.0乃至1.0の範囲の値に正規化された乱数が生成され、それらの各周波数ビンごとの乱数からなるノイズ信号がランダムノイズとされる。そして、ランダムノイズに包絡係数が乗算されて拡張スペクトルとされる。 That is, at the time of decoding, a random number normalized to a value in the range of -1.0 to 1.0 is generated for each frequency bin of the bands B31 to B33 which are the extension bands, and a noise signal composed of a random number for each frequency bin is generated. Random noise. Then, the random spectrum is multiplied by an envelope coefficient to obtain an extended spectrum.
 このようにして得られる拡張スペクトルは、乱数を正規化して得られたランダムノイズから生成されているため、曲線C83に示すように特定の周波数にエネルギが集中しておらずノイズ性の高いスペクトルとなっている。また、拡張スペクトルは、ランダムノイズを包絡係数によりレベル調整して得られたものであるので、その包絡は元のMDCTスペクトルの高域の包絡に近いものとなる。 Since the extended spectrum obtained in this way is generated from random noise obtained by normalizing random numbers, as shown in the curve C83, the energy is not concentrated on a specific frequency and a spectrum with high noise characteristics is obtained. It has become. In addition, since the extended spectrum is obtained by adjusting the level of random noise using the envelope coefficient, the envelope is close to the high-frequency envelope of the original MDCT spectrum.
 したがって、復号により得られる時系列信号は、符号化された元の入力信号と同様に、低域スペクトルのトーナリティが高く、高域スペクトルのトーナリティが低いものとなる。 Therefore, the time-series signal obtained by decoding has a high low-frequency spectrum tonality and a low high-frequency spectrum tonality, similar to the encoded original input signal.
〈符号化処理の説明〉
 次に、以上において説明した包絡係数が生成される場合に、符号化装置11により行われる符号化処理について説明する。
<Description of encoding process>
Next, an encoding process performed by the encoding device 11 when the envelope coefficient described above is generated will be described.
 以下、図16のフローチャートを参照して、符号化装置11による符号化処理について説明する。なお、ステップS91乃至ステップS94の処理は、図5のステップS11乃至ステップS14の処理と同様であるので、その説明は省略する。 Hereinafter, the encoding process performed by the encoding device 11 will be described with reference to the flowchart of FIG. Note that the processing from step S91 to step S94 is the same as the processing from step S11 to step S14 in FIG.
 ステップS95において、スペクトル特性決定部25は、低域特徴量抽出部23から供給された低域スペクトル特徴量と、高域特徴量抽出部24から供給された高域スペクトル特徴量とに基づいて、スペクトル特性を示すスペクトル特性符号を生成する。 In step S95, the spectrum characteristic determination unit 25, based on the low frequency spectrum feature quantity supplied from the low frequency feature quantity extraction unit 23 and the high frequency spectrum feature quantity supplied from the high frequency feature quantity extraction unit 24, A spectral characteristic code indicating the spectral characteristic is generated.
 具体的には、スペクトル特性決定部25は、低域スペクトル特徴量であるSFL、および高域スペクトル特徴量であるSFHが、ともに閾値未満である場合、値が「1」であるスペクトル特性符号を生成する。スペクトル特性符号「1」は、入力信号(MDCTスペクトル)の低域および高域が、ともにスペクトル特性として高いトーナリティを有していることを示している。 Specifically, the spectrum characteristic determination unit 25 selects a spectrum characteristic code having a value of “1” when both the SFL that is the low-frequency spectrum feature quantity and the SFH that is the high-frequency spectrum feature quantity are less than the threshold. Generate. The spectrum characteristic code “1” indicates that both low and high frequencies of the input signal (MDCT spectrum) have high tonality as a spectrum characteristic.
 また、スペクトル特性決定部25は、低域スペクトル特徴量であるSFLが閾値未満であり、高域スペクトル特徴量であるSFHが閾値以上である場合、値が「2」であるスペクトル特性符号を生成する。スペクトル特性符号「2」は、入力信号の低域(低域スペクトル)は高いトーナリティを有しており、入力信号の高域(高域スペクトル)は低いトーナリティ、つまり高いノイズ性を有していることを示している。 Further, the spectrum characteristic determining unit 25 generates a spectrum characteristic code having a value of “2” when the SFL that is the low-frequency spectrum feature quantity is less than the threshold value and the SFH that is the high-frequency spectrum feature quantity is equal to or greater than the threshold value. To do. The spectral characteristic code “2” has a high tonality in the low frequency (low frequency spectrum) of the input signal, and a low tonality, that is, a high noise property in the high frequency (high frequency spectrum) of the input signal. It is shown that.
 さらに、スペクトル特性決定部25は、低域スペクトル特徴量であるSFLが閾値以上である場合には、値が「0」であるスペクトル特性符号を生成する。スペクトル特性符号「0」は、入力信号がスペクトル特性として低いトーナリティを有していることを示している。 Further, the spectrum characteristic determination unit 25 generates a spectrum characteristic code having a value of “0” when the SFL that is the low-frequency spectrum feature amount is equal to or greater than the threshold value. The spectral characteristic code “0” indicates that the input signal has low tonality as a spectral characteristic.
 スペクトル特性決定部25は、生成されたスペクトル特性符号を拡張係数算出部26、拡張係数量子化部27、および多重化部28に供給する。 The spectrum characteristic determination unit 25 supplies the generated spectrum characteristic code to the extension coefficient calculation unit 26, the extension coefficient quantization unit 27, and the multiplexing unit 28.
 ステップS96において、拡張係数算出部26および拡張係数量子化部27は、スペクトル特性決定部25から供給されたスペクトル特性符号に基づいて、低域および高域のスペクトル特性がともに高いトーナリティを示すものであるか否かを判定する。 In step S96, the expansion coefficient calculation unit 26 and the expansion coefficient quantization unit 27 indicate tonalities in which both low-frequency and high-frequency spectral characteristics are high, based on the spectral characteristic code supplied from the spectral characteristic determination unit 25. It is determined whether or not there is.
 例えばスペクトル特性符号の値が「1」である場合、低域および高域のスペクトル特性が高いトーナリティを示すものであると判定される。 For example, when the value of the spectrum characteristic code is “1”, it is determined that the low-frequency and high-frequency spectral characteristics indicate high tonality.
 ステップS96において低域および高域のスペクトル特性が高いトーナリティを示すものであると判定された場合、処理はステップS97に進む。 If it is determined in step S96 that the low-frequency and high-frequency spectral characteristics indicate high tonality, the process proceeds to step S97.
 ステップS97において、拡張係数算出部26は、低域特徴量抽出部23からの低域折り返し疑似振幅スペクトル、および高域特徴量抽出部24からの高域疑似振幅スペクトルに基づいて高域全体に対して単一の拡張係数を算出し、拡張係数量子化部27に供給する。 In step S97, the expansion coefficient calculation unit 26 applies to the entire high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extracting unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extracting unit 24. Thus, a single expansion coefficient is calculated and supplied to the expansion coefficient quantization unit 27.
 なお、ステップS97では、図5のステップS17と同様の処理が行われる。ステップS97において拡張係数が算出されると、その後、処理はステップS101へと進む。 In step S97, processing similar to that in step S17 in FIG. 5 is performed. When the expansion coefficient is calculated in step S97, the process proceeds to step S101.
 また、ステップS96において低域および高域のスペクトル特性が高いトーナリティを示すものではないと判定された場合、処理はステップS98に進む。 If it is determined in step S96 that the low-frequency and high-frequency spectral characteristics do not indicate high tonality, the process proceeds to step S98.
 ステップS98において、拡張係数算出部26および拡張係数量子化部27は、スペクトル特性符号に基づいて、低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであるか否かを判定する。 In step S98, based on the spectrum characteristic code, the extension coefficient calculation unit 26 and the extension coefficient quantization unit 27 indicate tonalities with high spectral characteristics in the low band and low tonal characteristics with high spectral characteristics. Determine whether or not.
 例えば、スペクトル特性符号の値が「2」である場合、低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであると判定される。 For example, when the value of the spectrum characteristic code is “2”, it is determined that the low band spectral characteristic indicates high tonality and the high band spectral characteristic indicates low tonality.
 ステップS98において、低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであると判定された場合、処理はステップS99へと進む。 If it is determined in step S98 that the low frequency spectrum characteristic indicates high tonality and the high frequency spectral characteristic indicates low tonality, the process proceeds to step S99.
 ステップS99において、拡張係数算出部26は、高域特徴量抽出部24からの高域疑似振幅スペクトルに基づいて、高域の分割された帯域ごとに包絡係数を算出し、拡張係数量子化部27に供給する。 In step S99, the expansion coefficient calculation unit 26 calculates an envelope coefficient for each divided high frequency band based on the high frequency pseudo-amplitude spectrum from the high frequency characteristic amount extraction unit 24, and the expansion coefficient quantization unit 27 To supply.
 すなわち、例えば拡張係数算出部26は、図15に示したように高域全体を帯域B31乃至帯域B33の3つの帯域に分割し、各帯域内の周波数ビンの高域疑似振幅スペクトルの平均値をそれらの帯域の包絡係数として算出する。 That is, for example, the expansion coefficient calculation unit 26 divides the entire high frequency band into three bands B31 to B33 as shown in FIG. 15, and calculates the average value of the high frequency pseudo amplitude spectrum of the frequency bins in each band. It is calculated as an envelope coefficient of those bands.
 包絡係数が算出されると、その後、処理はステップS101へと進む。 After the envelope coefficient is calculated, the process proceeds to step S101.
 一方、ステップS98において、低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであると判定されなかった場合、処理はステップS100へと進む。 On the other hand, if it is not determined in step S98 that the low band spectral characteristic indicates high tonality and the high band spectral characteristic indicates low tonality, the process proceeds to step S100.
 ステップS100において、拡張係数算出部26は、低域特徴量抽出部23からの低域折り返し疑似振幅スペクトル、および高域特徴量抽出部24からの高域疑似振幅スペクトルに基づいて、高域の分割された帯域ごとに拡張係数を算出し、拡張係数量子化部27に供給する。なお、ステップS100では、図5のステップS18と同様の処理が行われる。ステップS100において拡張係数が算出されると、その後、処理はステップS101へと進む。 In step S100, the expansion coefficient calculation unit 26 divides the high frequency band based on the low frequency aliasing pseudo amplitude spectrum from the low frequency feature value extraction unit 23 and the high frequency pseudo amplitude spectrum from the high frequency feature value extraction unit 24. An expansion coefficient is calculated for each band, and is supplied to the expansion coefficient quantization unit 27. In step S100, the same process as in step S18 of FIG. 5 is performed. When the expansion coefficient is calculated in step S100, the process proceeds to step S101.
 ステップS97またはステップS100において拡張係数が算出されたか、またはステップS99において包絡係数が算出されると、ステップS101において、拡張係数量子化部27は、拡張係数算出部26から供給された拡張係数または包絡係数を量子化する。 When the expansion coefficient is calculated in step S97 or step S100, or when the envelope coefficient is calculated in step S99, in step S101, the expansion coefficient quantization unit 27 supplies the expansion coefficient or envelope supplied from the expansion coefficient calculation unit 26. Quantize the coefficients.
 すなわち、拡張係数量子化部27は、ステップS97またはステップS100の処理が行われ、拡張係数が供給された場合、拡張係数を量子化し、その結果得られた量子化拡張係数を多重化部28に供給する。また、拡張係数量子化部27は、ステップS99の処理が行われ、包絡係数が供給された場合、包絡係数を量子化し、その結果得られた量子化包絡係数を多重化部28に供給する。このとき、例えば拡張係数または包絡係数に対して、スカラ量子化またはベクトル量子化が行われる。 That is, when the expansion coefficient quantization unit 27 performs the process of step S97 or step S100 and is supplied with the expansion coefficient, the expansion coefficient quantization unit 27 quantizes the expansion coefficient, and the quantized expansion coefficient obtained as a result is sent to the multiplexing unit 28. Supply. Further, when the process of step S99 is performed and the envelope coefficient is supplied, the extended coefficient quantization unit 27 quantizes the envelope coefficient and supplies the resulting quantized envelope coefficient to the multiplexing unit 28. At this time, for example, scalar quantization or vector quantization is performed on the expansion coefficient or the envelope coefficient.
 ステップS102において、多重化部28は、スペクトル量子化部22からの量子化低域スペクトル、スペクトル特性決定部25からのスペクトル特性符号、および拡張係数量子化部27からの量子化拡張係数または量子化包絡係数を多重化し、符号列を生成する。このとき、多重化部28は、量子化低域スペクトルと、量子化拡張係数または量子化包絡係数とを符号化してから多重化を行う。 In step S <b> 102, the multiplexing unit 28 quantizes the low frequency spectrum from the spectrum quantization unit 22, the spectrum characteristic code from the spectrum characteristic determination unit 25, and the quantization extension coefficient or quantization from the extension coefficient quantization unit 27. An envelope coefficient is multiplexed to generate a code string. At this time, the multiplexing unit 28 performs multiplexing after encoding the quantized low-frequency spectrum and the quantization extension coefficient or the quantization envelope coefficient.
 多重化部28は、多重化により得られた符号列を出力し、符号化処理は終了する。 The multiplexing unit 28 outputs the code string obtained by multiplexing, and the encoding process ends.
 以上のようにして符号化装置11は、低域スペクトル特徴量および高域スペクトル特徴量に基づいて入力信号のスペクトル特性を決定する。そして、符号化装置11は、スペクトル特性に応じて、復号時に拡張スペクトルを得るための情報として拡張係数または包絡係数を算出する。 As described above, the encoding device 11 determines the spectral characteristics of the input signal based on the low-frequency spectrum feature value and the high-frequency spectrum feature value. Then, the encoding device 11 calculates an expansion coefficient or an envelope coefficient as information for obtaining an extended spectrum at the time of decoding according to the spectrum characteristics.
 これにより、復号時に拡張係数や包絡係数を用いて適切な拡張スペクトルを得ることができ、原理遅延を増加させることなく、低リソースな環境においても高音質な音声を得ることができるようになる。特に、包絡係数を用いて拡張スペクトルを生成する場合には、低域スペクトルのトーナリティが高いときでも、トーナリティの低い拡張スペクトルを得ることができる。 This makes it possible to obtain an appropriate extended spectrum by using an extension coefficient or an envelope coefficient at the time of decoding, and to obtain high-quality sound even in a low resource environment without increasing the principle delay. In particular, when an extended spectrum is generated using an envelope coefficient, an extended spectrum with a low tonality can be obtained even when a low-frequency spectrum has a high tonality.
〈復号処理の説明〉
 次に、符号化装置11により図16を参照して説明した符号化処理が行われた場合に、復号装置81により行われる復号処理を図17のフローチャートを参照して説明する。
<Description of decryption processing>
Next, the decoding process performed by the decoding apparatus 81 when the encoding process described with reference to FIG. 16 is performed by the encoding apparatus 11 will be described with reference to the flowchart of FIG.
 なお、ステップS141およびステップS142の処理は、図7のステップS51およびステップS52の処理と同様であるので、その説明は省略する。但し、ステップS141では、符号列を分解して得られた量子化拡張係数または量子化包絡係数の何れかが、分解部91から拡張係数逆量子化部93に供給されることになる。 In addition, since the process of step S141 and step S142 is the same as the process of step S51 of FIG. 7, and step S52, the description is abbreviate | omitted. However, in step S141, either the quantization expansion coefficient or the quantization envelope coefficient obtained by decomposing the code string is supplied from the decomposition unit 91 to the expansion coefficient inverse quantization unit 93.
 ステップS143において、拡張係数逆量子化部93および拡張スペクトル生成部94は、分解部91から供給されたスペクトル特性符号に基づいて、低域および高域のスペクトル特性が高いトーナリティを示すものであるか否かを判定する。 In step S143, the extended coefficient inverse quantization unit 93 and the extended spectrum generation unit 94 indicate the tonality in which the low-frequency and high-frequency spectral characteristics are high based on the spectral characteristic code supplied from the decomposing unit 91. Determine whether or not.
 例えばスペクトル特性符号の値が「1」である場合、低域および高域のスペクトル特性が高いトーナリティを示すものであると判定される。この場合、符号列には単一の量子化拡張係数が含まれているので、その量子化拡張係数が分解部91から拡張係数逆量子化部93へと供給される。 For example, when the value of the spectrum characteristic code is “1”, it is determined that the low-frequency and high-frequency spectral characteristics indicate high tonality. In this case, since the code string includes a single quantization extension coefficient, the quantization extension coefficient is supplied from the decomposition unit 91 to the extension coefficient inverse quantization unit 93.
 ステップS143において低域および高域のスペクトル特性が高いトーナリティを示すものであると判定された場合、ステップS144およびステップS145の処理が行われて拡張スペクトルが生成され、IMDCT部95に供給される。 If it is determined in step S143 that the low-frequency and high-frequency spectral characteristics indicate high tonality, the processing in steps S144 and S145 is performed to generate an extended spectrum and supplied to the IMDCT unit 95.
 なお、これらのステップS144およびステップS145の処理は、図7のステップS54およびステップS55の処理と同様であるので、その説明は省略する。ステップS145の処理が行われると、その後、処理はステップS151へと進む。 In addition, since the process of these step S144 and step S145 is the same as the process of step S54 of FIG. 7, and step S55, the description is abbreviate | omitted. When the process of step S145 is performed, the process proceeds to step S151.
 また、ステップS143において低域および高域のスペクトル特性が高いトーナリティを示すものであると判定されなかった場合、処理はステップS146へと進む。 If it is not determined in step S143 that the low-frequency and high-frequency spectral characteristics indicate high tonality, the process proceeds to step S146.
 ステップS146において、拡張係数逆量子化部93および拡張スペクトル生成部94はスペクトル特性符号に基づいて、低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであるか否かを判定する。例えば、スペクトル特性符号の値が「2」である場合、低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであると判定される。 In step S146, based on the spectral characteristic code, the extended coefficient inverse quantization unit 93 and the extended spectral generation unit 94 indicate tonalities with high spectral characteristics in the low band and low tonal characteristics with high spectral characteristics. Determine whether or not. For example, when the value of the spectrum characteristic code is “2”, it is determined that the low band spectral characteristic indicates a high tonality and the high band spectral characteristic indicates a low tonality.
 ステップS146において低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであると判定された場合、処理はステップS147へと進む。この場合、分解部91から拡張係数逆量子化部93には、高域の帯域ごとの量子化包絡係数が供給される。 If it is determined in step S146 that the low band spectral characteristic indicates a high tonality and the high band spectral characteristic indicates a low tonality, the process proceeds to step S147. In this case, the quantization envelope coefficient for each high frequency band is supplied from the decomposition unit 91 to the expansion coefficient inverse quantization unit 93.
 ステップS147において、拡張係数逆量子化部93は、分解部91から供給された、高域を構成する複数の帯域ごとの量子化包絡係数を逆量子化し、得られた包絡係数を拡張スペクトル生成部94に供給する。これにより、例えば図15に示した帯域B31乃至帯域B33の包絡係数L31乃至包絡係数L33が得られる。 In step S147, the expansion coefficient inverse quantization unit 93 inversely quantizes the quantization envelope coefficient for each of a plurality of bands constituting the high frequency band supplied from the decomposition unit 91, and converts the obtained envelope coefficient into an expansion spectrum generation unit. 94. Thereby, for example, the envelope coefficients L31 to L33 of the bands B31 to B33 shown in FIG. 15 are obtained.
 ステップS148において、拡張スペクトル生成部94は、拡張係数逆量子化部93から供給された各帯域の包絡係数に基づいて拡張スペクトルを生成し、IMDCT部95に供給する。 In step S148, the extended spectrum generation unit 94 generates an extended spectrum based on the envelope coefficient of each band supplied from the extension coefficient inverse quantization unit 93, and supplies the generated spectrum to the IMDCT unit 95.
 具体的には、拡張スペクトル生成部94は、拡張帯域の各周波数ビンに-1.0乃至1.0の範囲の値に正規化された乱数を割り当ててランダムノイズを生成し、そのランダムノイズの各帯域の周波数ビンにおける値に、各帯域の包絡係数を乗算し、拡張スペクトルとする。 Specifically, the extended spectrum generation unit 94 assigns a random number normalized to a value in the range of −1.0 to 1.0 to each frequency bin of the extension band to generate random noise, and the frequency of each band of the random noise The value in the bin is multiplied by the envelope coefficient of each band to obtain an extended spectrum.
 拡張スペクトルが生成されると、その後、処理はステップS151へと進む。 After the extended spectrum is generated, the process proceeds to step S151.
 さらに、ステップS146において低域のスペクトル特性が高いトーナリティを示し、高域のスペクトル特性が低いトーナリティを示すものであると判定されなかった場合、ステップS149およびステップS150の処理が行われる。 Furthermore, if it is not determined in step S146 that the low-frequency spectral characteristics indicate high tonality and the high-frequency spectral characteristics indicate low tonality, processing in steps S149 and S150 is performed.
 この場合、分解部91から拡張係数逆量子化部93に高域の帯域ごとの量子化拡張係数が供給されて逆量子化され、その結果得られた拡張係数と、低域スペクトルとから拡張スペクトルが生成される。なお、これらのステップS149およびステップS150の処理は、図7のステップS56およびステップS57の処理と同様であるので、その説明は省略する。 In this case, the quantization expansion coefficient for each high frequency band is supplied from the decomposition unit 91 to the expansion coefficient inverse quantization unit 93 to perform inverse quantization, and the expansion spectrum is obtained from the expansion coefficient obtained as a result and the low frequency spectrum. Is generated. Note that the processing in step S149 and step S150 is the same as the processing in step S56 and step S57 in FIG.
 このようにして拡張スペクトルが生成されると、その後、処理はステップS151へと進む。 When the extended spectrum is generated in this way, the process thereafter proceeds to step S151.
 ステップS145、ステップS148、またはステップS150の処理が行われて拡張スペクトルが生成されると、ステップS151の処理が行われて時系列信号が生成されるが、ステップS151の処理は図7のステップS58の処理と同様であるので、その説明は省略する。 When the process of step S145, step S148, or step S150 is performed to generate an extended spectrum, the process of step S151 is performed to generate a time-series signal. The process of step S151 is performed in step S58 of FIG. Since this is the same as the above process, the description thereof is omitted.
 ステップS151において得られた時系列信号が、復号により得られた音声信号として出力されると、復号処理は終了する。 When the time-series signal obtained in step S151 is output as an audio signal obtained by decoding, the decoding process ends.
 以上のようにして復号装置81は、スペクトル特性に応じた拡張係数または包絡係数を復号および逆量子化により得て、得られた拡張係数または包絡係数を用いて拡張スペクトルを生成する。 As described above, the decoding device 81 obtains an extension coefficient or an envelope coefficient corresponding to the spectrum characteristic by decoding and inverse quantization, and generates an extension spectrum using the obtained extension coefficient or envelope coefficient.
 このように、スペクトル特性に応じた拡張係数または包絡係数を用いて、種スペクトルまたはラインダムノイズのレベルを調整し、拡張スペクトルとすることで、周波数領域で高域のレベルを調整することができるとともに、スペクトル特性に応じた高域のレベル調整を実現することができる。これにより、復号時の帯域拡張による遅延時間を削減し、低リソースな環境でも高音質な音声を得ることができる。 In this way, the level of the seed spectrum or line dam noise is adjusted using the expansion coefficient or envelope coefficient corresponding to the spectral characteristics to obtain an extended spectrum, whereby the high frequency level can be adjusted in the frequency domain. At the same time, it is possible to achieve high-level level adjustment according to the spectral characteristics. As a result, delay time due to band expansion during decoding can be reduced, and high-quality sound can be obtained even in a low-resource environment.
 ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどが含まれる。 By the way, the series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software is installed in the computer. Here, the computer includes, for example, a general-purpose personal computer capable of executing various functions by installing a computer incorporated in dedicated hardware and various programs.
 図18は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示すブロック図である。 FIG. 18 is a block diagram illustrating a configuration example of hardware of a computer that executes the above-described series of processes by a program.
 コンピュータにおいて、CPU(Central Processing Unit)501,ROM502,RAM503は、バス504により相互に接続されている。 In a computer, a CPU (Central Processing Unit) 501, ROM 502, and RAM 503 are connected to each other by a bus 504.
 バス504には、さらに、入出力インターフェース505が接続されている。入出力インターフェース505には、入力部506、出力部507、記録部508、通信部509、及びドライブ510が接続されている。 An input / output interface 505 is further connected to the bus 504. An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
 入力部506は、キーボード、マウス、マイクロホン、撮像素子などよりなる。出力部507は、ディスプレイ、スピーカなどよりなる。記録部508は、ハードディスクや不揮発性のメモリなどよりなる。通信部509は、ネットワークインターフェースなどよりなる。ドライブ510は、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリなどのリムーバブルメディア511を駆動する。 The input unit 506 includes a keyboard, a mouse, a microphone, an image sensor, and the like. The output unit 507 includes a display, a speaker, and the like. The recording unit 508 includes a hard disk, a nonvolatile memory, and the like. The communication unit 509 includes a network interface or the like. The drive 510 drives a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
 以上のように構成されるコンピュータでは、CPU501が、例えば、記録部508に記録されているプログラムを、入出力インターフェース505及びバス504を介して、RAM503にロードして実行することにより、上述した一連の処理が行われる。 In the computer configured as described above, the CPU 501 loads the program recorded in the recording unit 508 to the RAM 503 via the input / output interface 505 and the bus 504 and executes the program, for example. Is performed.
 コンピュータ(CPU501)が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブルメディア511に記録して提供することができる。また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供することができる。 The program executed by the computer (CPU 501) can be provided by being recorded on the removable medium 511 as a package medium, for example. The program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
 コンピュータでは、プログラムは、リムーバブルメディア511をドライブ510に装着することにより、入出力インターフェース505を介して、記録部508にインストールすることができる。また、プログラムは、有線または無線の伝送媒体を介して、通信部509で受信し、記録部508にインストールすることができる。その他、プログラムは、ROM502や記録部508に、あらかじめインストールしておくことができる。 In the computer, the program can be installed in the recording unit 508 via the input / output interface 505 by attaching the removable medium 511 to the drive 510. Further, the program can be received by the communication unit 509 via a wired or wireless transmission medium and installed in the recording unit 508. In addition, the program can be installed in the ROM 502 or the recording unit 508 in advance.
 なお、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.
 また、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.
 例えば、本技術は、1つの機能をネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 For example, the present technology can take a cloud computing configuration in which one function is shared by a plurality of devices via a network and is jointly processed.
 また、上述のフローチャートで説明した各ステップは、1つの装置で実行する他、複数の装置で分担して実行することができる。 Further, each step described in the above flowchart can be executed by one device or can be shared by a plurality of devices.
 さらに、1つのステップに複数の処理が含まれる場合には、その1つのステップに含まれる複数の処理は、1つの装置で実行する他、複数の装置で分担して実行することができる。 Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.
 さらに、本技術は、以下の構成とすることも可能である。 Furthermore, the present technology can be configured as follows.
[1]
 低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得する取得部と、
 前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成する生成部と、
 前記低域スペクトルと前記拡張スペクトルを合成する合成部と
 を備える復号装置。
[2]
 前記生成部は、前記低域スペクトルおよび前記拡張係数に基づいて前記拡張スペクトルを生成する
 [1]に記載の復号装置。
[3]
 前記生成部は、前記拡張係数に基づいて、前記低域スペクトルから得られたスペクトルのレベルを調整することで前記拡張スペクトルを生成する
 [2]に記載の復号装置。
[4]
 前記生成部は、前記単一の前記拡張係数に基づいて前記拡張スペクトルを生成する場合、前記拡張係数に基づいて前記スペクトルの前記拡張帯域全体のレベルを調整し、前記複数の帯域ごとの前記拡張係数に基づいて前記拡張スペクトルを生成する場合、前記帯域の前記拡張係数に基づいて、前記スペクトルの前記帯域のレベルを調整する
 [3]に記載の復号装置。
[5]
 前記生成部は、前記拡張係数に基づいて、所定のノイズのレベルを調整することで前記拡張スペクトルを生成する
 [1]に記載の復号装置。
[6]
 前記低域スペクトルの値は、元の時系列信号の振幅成分および位相成分により定まる
 [1]乃至[5]の何れか一項に記載の復号装置。
[7]
 前記低域スペクトルは、MDCTスペクトルである
 [6]に記載の復号装置。
[8]
 低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得し、
 前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成し、
 前記低域スペクトルと前記拡張スペクトルを合成する
 ステップを含む復号方法。
[9]
 低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得し、
 前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成し、
 前記低域スペクトルと前記拡張スペクトルを合成する
 ステップを含む処理をコンピュータに実行させるプログラム。
[10]
 時系列信号を直交変換して得られたスペクトルから特徴量を抽出する特徴量抽出部と、
 前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出する算出部と、
 前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成する多重化部と
 を備える符号化装置。
[11]
 前記特徴量は前記スペクトルのトーナリティを示す情報である
 [10]に記載の符号化装置。
[12]
 前記算出部は、前記スペクトルのトーナリティが高い場合、前記単一の前記拡張係数を算出し、前記スペクトルのトーナリティが低い場合、前記複数の帯域ごとの前記拡張係数を算出する
 [11]に記載の符号化装置。
[13]
 前記算出部は、前記スペクトルの前記拡張帯域の平均振幅と、前記低域スペクトルの平均振幅との比を前記拡張係数として算出する
 [10]乃至[12]の何れか一項に記載の符号化装置。
[14]
 前記算出手段は、前記スペクトルの低域のトーナリティが高く、前記スペクトルの前記拡張帯域のトーナリティが低い場合、前記スペクトルの前記拡張帯域の包絡情報を前記拡張係数として算出する
 [11]に記載の符号化装置。
[15]
 前記スペクトルの値は、前記時系列信号の振幅成分および位相成分により定まる
 [10]乃至[14]の何れか一項に記載の符号化装置。
[16]
 前記直交変換はMDCTである
 [15]に記載の符号化装置。
[17]
 時系列信号を直交変換して得られたスペクトルから特徴量を抽出し、
 前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出し、
 前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成する
 ステップを含む符号化方法。
[18]
 時系列信号を直交変換して得られたスペクトルから特徴量を抽出し、
 前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出し、
 前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成する
 ステップを含む処理をコンピュータに実行させるプログラム。
[1]
An acquisition unit for acquiring a low-frequency spectrum and a single expansion coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low-frequency band, or an extension coefficient for each of a plurality of bands constituting the extension band; ,
A generating unit that generates the extended spectrum based on the single extension coefficient or the extension coefficient for each of the plurality of bands;
A decoding device comprising: a combining unit that combines the low-frequency spectrum and the extended spectrum.
[2]
The decoding device according to [1], wherein the generation unit generates the extended spectrum based on the low frequency spectrum and the extension coefficient.
[3]
The decoding device according to [2], wherein the generation unit generates the extended spectrum by adjusting a level of a spectrum obtained from the low frequency spectrum based on the extension coefficient.
[4]
When generating the extension spectrum based on the single extension coefficient, the generation unit adjusts the level of the entire extension band of the spectrum based on the extension coefficient, and the extension for each of the plurality of bands The decoding device according to [3], wherein when generating the extended spectrum based on a coefficient, the level of the band of the spectrum is adjusted based on the extension coefficient of the band.
[5]
The decoding device according to [1], wherein the generation unit generates the extended spectrum by adjusting a predetermined noise level based on the extension coefficient.
[6]
The decoding device according to any one of [1] to [5], wherein the value of the low-frequency spectrum is determined by an amplitude component and a phase component of the original time-series signal.
[7]
The decoding device according to [6], wherein the low frequency spectrum is an MDCT spectrum.
[8]
Obtaining a low band spectrum and a single expansion coefficient for the extension band to obtain an extension spectrum of an extension band different from the low band, or an extension coefficient for each of a plurality of bands constituting the extension band;
Generating the extended spectrum based on the single extension factor or the extension factor for each of the plurality of bands;
A decoding method including a step of combining the low-frequency spectrum and the extended spectrum.
[9]
Obtaining a low band spectrum and a single expansion coefficient for the extension band to obtain an extension spectrum of an extension band different from the low band, or an extension coefficient for each of a plurality of bands constituting the extension band;
Generating the extended spectrum based on the single extension factor or the extension factor for each of the plurality of bands;
A program that causes a computer to execute processing including a step of synthesizing the low-frequency spectrum and the extended spectrum.
[10]
A feature amount extraction unit that extracts a feature amount from a spectrum obtained by orthogonally transforming a time series signal;
Depending on the feature amount, a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low band of the spectrum, or an extension coefficient for each of a plurality of bands constituting the extension band, A calculation unit for calculating based on the spectrum;
An encoding apparatus comprising: a multiplexing unit that multiplexes a low frequency spectrum that is a low frequency component of the spectrum and the extension coefficient to generate a code string.
[11]
The encoding device according to [10], wherein the feature amount is information indicating the tonality of the spectrum.
[12]
The calculation unit calculates the single extension coefficient when the spectrum tonality is high, and calculates the extension coefficient for each of the plurality of bands when the spectrum tonality is low. Encoding device.
[13]
The encoding unit according to any one of [10] to [12], wherein the calculation unit calculates a ratio between an average amplitude of the extension band of the spectrum and an average amplitude of the low band spectrum as the extension coefficient. apparatus.
[14]
The code according to [11], wherein the calculation means calculates envelope information of the extension band of the spectrum as the extension coefficient when the low-frequency tonality of the spectrum is high and the tonality of the extension band of the spectrum is low. Device.
[15]
The encoding device according to any one of [10] to [14], wherein the spectrum value is determined by an amplitude component and a phase component of the time-series signal.
[16]
The encoding device according to [15], wherein the orthogonal transform is MDCT.
[17]
Extract features from the spectrum obtained by orthogonal transform of time series signal,
Depending on the feature amount, a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low band of the spectrum, or an extension coefficient for each of a plurality of bands constituting the extension band, Calculated based on the spectrum,
A coding method including a step of generating a code string by multiplexing a low frequency spectrum which is a low frequency component of the spectrum and the extension coefficient.
[18]
Extract features from the spectrum obtained by orthogonal transform of time series signal,
Depending on the feature amount, a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low band of the spectrum, or an extension coefficient for each of a plurality of bands constituting the extension band, Calculated based on the spectrum,
A program that causes a computer to execute processing including a step of generating a code string by multiplexing a low-frequency spectrum that is a low-frequency component of the spectrum and the extension coefficient.
 11 符号化装置, 21 MDCT部, 22 スペクトル量子化部, 23 低域特徴量抽出部, 24 高域特徴量抽出部, 25 スペクトル特性決定部, 26 拡張係数算出部, 27 拡張係数量子化部, 28 多重化部, 81 復号装置, 91 分解部, 92 スペクトル逆量子化部, 93 拡張係数逆量子化部, 94 拡張スペクトル生成部, 95 IMDCT部 11 Encoder, 21 MDCT unit, 22 Spectrum quantization unit, 23 Low frequency feature extraction unit, 24 High frequency feature extraction unit, 25 Spectrum characteristic determination unit, 26 Extension coefficient calculation unit, 27 Extension coefficient quantization unit, 28 multiplexing unit, 81 decoding device, 91 decomposing unit, 92 spectrum dequantizing unit, 93 extended coefficient dequantizing unit, 94 extended spectrum generating unit, 95 IMDCT unit

Claims (18)

  1.  低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得する取得部と、
     前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成する生成部と、
     前記低域スペクトルと前記拡張スペクトルを合成する合成部と
     を備える復号装置。
    An acquisition unit for acquiring a low-frequency spectrum and a single expansion coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low-frequency band, or an extension coefficient for each of a plurality of bands constituting the extension band; ,
    A generating unit that generates the extended spectrum based on the single extension coefficient or the extension coefficient for each of the plurality of bands;
    A decoding device comprising: a combining unit that combines the low-frequency spectrum and the extended spectrum.
  2.  前記生成部は、前記低域スペクトルおよび前記拡張係数に基づいて前記拡張スペクトルを生成する
     請求項1に記載の復号装置。
    The decoding device according to claim 1, wherein the generation unit generates the extended spectrum based on the low frequency spectrum and the extension coefficient.
  3.  前記生成部は、前記拡張係数に基づいて、前記低域スペクトルから得られたスペクトルのレベルを調整することで前記拡張スペクトルを生成する
     請求項2に記載の復号装置。
    The decoding device according to claim 2, wherein the generation unit generates the extended spectrum by adjusting a level of a spectrum obtained from the low frequency spectrum based on the extension coefficient.
  4.  前記生成部は、前記単一の前記拡張係数に基づいて前記拡張スペクトルを生成する場合、前記拡張係数に基づいて前記スペクトルの前記拡張帯域全体のレベルを調整し、前記複数の帯域ごとの前記拡張係数に基づいて前記拡張スペクトルを生成する場合、前記帯域の前記拡張係数に基づいて、前記スペクトルの前記帯域のレベルを調整する
     請求項3に記載の復号装置。
    When generating the extension spectrum based on the single extension coefficient, the generation unit adjusts the level of the entire extension band of the spectrum based on the extension coefficient, and the extension for each of the plurality of bands The decoding device according to claim 3, wherein when generating the extended spectrum based on a coefficient, the level of the band of the spectrum is adjusted based on the extended coefficient of the band.
  5.  前記生成部は、前記拡張係数に基づいて、所定のノイズのレベルを調整することで前記拡張スペクトルを生成する
     請求項1に記載の復号装置。
    The decoding device according to claim 1, wherein the generation unit generates the extended spectrum by adjusting a predetermined noise level based on the extension coefficient.
  6.  前記低域スペクトルの値は、元の時系列信号の振幅成分および位相成分により定まる
     請求項1に記載の復号装置。
    The decoding device according to claim 1, wherein the value of the low-frequency spectrum is determined by an amplitude component and a phase component of the original time-series signal.
  7.  前記低域スペクトルは、MDCTスペクトルである
     請求項6に記載の復号装置。
    The decoding device according to claim 6, wherein the low-frequency spectrum is an MDCT spectrum.
  8.  低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得し、
     前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成し、
     前記低域スペクトルと前記拡張スペクトルを合成する
     ステップを含む復号方法。
    Obtaining a low band spectrum and a single expansion coefficient for the extension band to obtain an extension spectrum of an extension band different from the low band, or an extension coefficient for each of a plurality of bands constituting the extension band;
    Generating the extended spectrum based on the single extension factor or the extension factor for each of the plurality of bands;
    A decoding method including a step of combining the low-frequency spectrum and the extended spectrum.
  9.  低域スペクトルと、低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数とを取得し、
     前記単一の前記拡張係数、または前記複数の帯域ごとの前記拡張係数に基づいて、前記拡張スペクトルを生成し、
     前記低域スペクトルと前記拡張スペクトルを合成する
     ステップを含む処理をコンピュータに実行させるプログラム。
    Obtaining a low band spectrum and a single expansion coefficient for the extension band to obtain an extension spectrum of an extension band different from the low band, or an extension coefficient for each of a plurality of bands constituting the extension band;
    Generating the extended spectrum based on the single extension factor or the extension factor for each of the plurality of bands;
    A program that causes a computer to execute processing including a step of synthesizing the low-frequency spectrum and the extended spectrum.
  10.  時系列信号を直交変換して得られたスペクトルから特徴量を抽出する特徴量抽出部と、
     前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出する算出部と、
     前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成する多重化部と
     を備える符号化装置。
    A feature amount extraction unit that extracts a feature amount from a spectrum obtained by orthogonally transforming a time series signal;
    Depending on the feature amount, a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low band of the spectrum, or an extension coefficient for each of a plurality of bands constituting the extension band, A calculation unit for calculating based on the spectrum;
    An encoding apparatus comprising: a multiplexing unit that multiplexes a low frequency spectrum that is a low frequency component of the spectrum and the extension coefficient to generate a code string.
  11.  前記特徴量は前記スペクトルのトーナリティを示す情報である
     請求項10に記載の符号化装置。
    The encoding device according to claim 10, wherein the feature amount is information indicating the tonality of the spectrum.
  12.  前記算出部は、前記スペクトルのトーナリティが高い場合、前記単一の前記拡張係数を算出し、前記スペクトルのトーナリティが低い場合、前記複数の帯域ごとの前記拡張係数を算出する
     請求項11に記載の符号化装置。
    The calculation unit according to claim 11, wherein the calculation unit calculates the single extension coefficient when the spectrum tonality is high, and calculates the extension coefficient for each of the plurality of bands when the spectrum tonality is low. Encoding device.
  13.  前記算出部は、前記スペクトルの前記拡張帯域の平均振幅と、前記低域スペクトルの平均振幅との比を前記拡張係数として算出する
     請求項10に記載の符号化装置。
    The encoding device according to claim 10, wherein the calculation unit calculates a ratio between an average amplitude of the extension band of the spectrum and an average amplitude of the low band spectrum as the extension coefficient.
  14.  前記算出手段は、前記スペクトルの低域のトーナリティが高く、前記スペクトルの前記拡張帯域のトーナリティが低い場合、前記スペクトルの前記拡張帯域の包絡情報を前記拡張係数として算出する
     請求項11に記載の符号化装置。
    The code according to claim 11, wherein the calculation means calculates envelope information of the extension band of the spectrum as the extension coefficient when the tonality of the low band of the spectrum is high and the tonality of the extension band of the spectrum is low. Device.
  15.  前記スペクトルの値は、前記時系列信号の振幅成分および位相成分により定まる
     請求項10に記載の符号化装置。
    The encoding device according to claim 10, wherein the spectrum value is determined by an amplitude component and a phase component of the time-series signal.
  16.  前記直交変換はMDCTである
     請求項15に記載の符号化装置。
    The encoding apparatus according to claim 15, wherein the orthogonal transform is MDCT.
  17.  時系列信号を直交変換して得られたスペクトルから特徴量を抽出し、
     前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出し、
     前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成する
     ステップを含む符号化方法。
    Extract features from the spectrum obtained by orthogonal transform of time series signal,
    Depending on the feature amount, a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low band of the spectrum, or an extension coefficient for each of a plurality of bands constituting the extension band, Calculated based on the spectrum,
    A coding method including a step of generating a code string by multiplexing a low frequency spectrum which is a low frequency component of the spectrum and the extension coefficient.
  18.  時系列信号を直交変換して得られたスペクトルから特徴量を抽出し、
     前記特徴量に応じて、前記スペクトルの低域とは異なる拡張帯域の拡張スペクトルを得るための前記拡張帯域に対する単一の拡張係数、または前記拡張帯域を構成する複数の帯域ごとの拡張係数を前記スペクトルに基づいて算出し、
     前記スペクトルの低域成分である低域スペクトルと、前記拡張係数とを多重化して符号列を生成する
     ステップを含む処理をコンピュータに実行させるプログラム。
    Extract features from the spectrum obtained by orthogonal transform of time series signal,
    Depending on the feature amount, a single extension coefficient for the extension band for obtaining an extension spectrum of an extension band different from the low band of the spectrum, or an extension coefficient for each of a plurality of bands constituting the extension band, Calculated based on the spectrum,
    A program that causes a computer to execute processing including a step of generating a code string by multiplexing a low-frequency spectrum that is a low-frequency component of the spectrum and the extension coefficient.
PCT/JP2015/070924 2014-08-06 2015-07-23 Coding device and method, decoding device and method, and program WO2016021412A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP15830713.2A EP3179476B1 (en) 2014-08-06 2015-07-23 Coding device and method, and program
EP19199364.1A EP3608910B1 (en) 2014-08-06 2015-07-23 Decoding device and method, and program
US15/500,253 US10049677B2 (en) 2014-08-06 2015-07-23 Encoding device and method, decoding device and method, and program
CN201580041640.XA CN106663449B (en) 2014-08-06 2015-07-23 Encoding device and method, decoding device and method, and program
US16/037,574 US10510353B2 (en) 2014-08-06 2018-07-17 Encoding device and method, decoding device and method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014160417A JP2016038435A (en) 2014-08-06 2014-08-06 Encoding device and method, decoding device and method, and program
JP2014-160417 2014-08-06

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US15/500,253 A-371-Of-International US10049677B2 (en) 2014-08-06 2015-07-23 Encoding device and method, decoding device and method, and program
US16/037,574 Continuation US10510353B2 (en) 2014-08-06 2018-07-17 Encoding device and method, decoding device and method, and program

Publications (1)

Publication Number Publication Date
WO2016021412A1 true WO2016021412A1 (en) 2016-02-11

Family

ID=55263684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/070924 WO2016021412A1 (en) 2014-08-06 2015-07-23 Coding device and method, decoding device and method, and program

Country Status (5)

Country Link
US (2) US10049677B2 (en)
EP (2) EP3179476B1 (en)
JP (1) JP2016038435A (en)
CN (1) CN106663449B (en)
WO (1) WO2016021412A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016038435A (en) 2014-08-06 2016-03-22 ソニー株式会社 Encoding device and method, decoding device and method, and program
JP6693551B1 (en) * 2018-11-30 2020-05-13 株式会社ソシオネクスト Signal processing device and signal processing method
JP7178506B2 (en) * 2019-02-21 2022-11-25 テレフオンアクチーボラゲット エルエム エリクソン(パブル) Method and Associated Controller for Phase ECU F0 Interpolation Split
CN110070884B (en) * 2019-02-28 2022-03-15 北京字节跳动网络技术有限公司 Audio starting point detection method and device
WO2021176842A1 (en) * 2020-03-04 2021-09-10 ソニーグループ株式会社 Decoding device, decoding method, program, encoding device, and encoding method
CN113190508B (en) * 2021-04-26 2023-05-05 重庆市规划和自然资源信息中心 Management-oriented natural language recognition method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011248378A (en) * 2004-05-19 2011-12-08 Panasonic Corp Encoder, decoder, and method therefor
JP2013178546A (en) * 2005-07-15 2013-09-09 Microsoft Corp Frequency segmentation for obtaining band for efficient coding of digital media

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5329714B1 (en) 1969-09-16 1978-08-22
JPS5325293B2 (en) 1973-05-02 1978-07-26
JPS5325293A (en) 1976-08-20 1978-03-08 Sumitomo Metal Ind Ltd Cooling and recovering method for granular metallurgical slag
NL7609610A (en) 1976-08-30 1978-03-02 Philips Nv METHOD FOR MAKING COPIES OF TRACKS OF INFORMATION ON CARRIERS.
US7555434B2 (en) * 2002-07-19 2009-06-30 Nec Corporation Audio decoding device, decoding method, and program
KR20070084002A (en) * 2004-11-05 2007-08-24 마츠시타 덴끼 산교 가부시키가이샤 Scalable decoding apparatus and scalable encoding apparatus
JP4899359B2 (en) * 2005-07-11 2012-03-21 ソニー株式会社 Signal encoding apparatus and method, signal decoding apparatus and method, program, and recording medium
CN101086845B (en) * 2006-06-08 2011-06-01 北京天籁传音数字技术有限公司 Sound coding device and method and sound decoding device and method
EP1870880B1 (en) * 2006-06-19 2010-04-07 Sharp Kabushiki Kaisha Signal processing method, signal processing apparatus and recording medium
EP2063418A4 (en) * 2006-09-15 2010-12-15 Panasonic Corp Audio encoding device and audio encoding method
EP2212884B1 (en) * 2007-11-06 2013-01-02 Nokia Corporation An encoder
EP2239731B1 (en) * 2008-01-25 2018-10-31 III Holdings 12, LLC Encoding device, decoding device, and method thereof
BRPI0910528B1 (en) * 2008-07-11 2020-09-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. INSTRUMENT AND METHOD FOR GENERATING EXTENDED BANDWIDTH SIGNAL
CN101727906B (en) * 2008-10-29 2012-02-01 华为技术有限公司 Method and device for coding and decoding of high-frequency band signals
EP2945159B1 (en) * 2008-12-15 2018-03-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder and bandwidth extension decoder
US8463599B2 (en) * 2009-02-04 2013-06-11 Motorola Mobility Llc Bandwidth extension method and apparatus for a modified discrete cosine transform audio coder
JP5511785B2 (en) * 2009-02-26 2014-06-04 パナソニック株式会社 Encoding device, decoding device and methods thereof
PL2273493T3 (en) * 2009-06-29 2013-07-31 Fraunhofer Ges Forschung Bandwidth extension encoding and decoding
CN101996640B (en) * 2009-08-31 2012-04-04 华为技术有限公司 Frequency band expansion method and device
JP5754899B2 (en) * 2009-10-07 2015-07-29 ソニー株式会社 Decoding apparatus and method, and program
CN102612712B (en) * 2009-11-19 2014-03-12 瑞典爱立信有限公司 Bandwidth extension of low band audio signal
JP5651980B2 (en) * 2010-03-31 2015-01-14 ソニー株式会社 Decoding device, decoding method, and program
JP5850216B2 (en) * 2010-04-13 2016-02-03 ソニー株式会社 Signal processing apparatus and method, encoding apparatus and method, decoding apparatus and method, and program
US9047875B2 (en) * 2010-07-19 2015-06-02 Futurewei Technologies, Inc. Spectrum flatness control for bandwidth extension
JP5707842B2 (en) * 2010-10-15 2015-04-30 ソニー株式会社 Encoding apparatus and method, decoding apparatus and method, and program
ES2745141T3 (en) * 2011-02-18 2020-02-27 Ntt Docomo Inc Speech decoder, speech encoder, speech decoding method, speech encoding method, speech decoding program, and speech encoding program
KR102078865B1 (en) * 2011-06-30 2020-02-19 삼성전자주식회사 Apparatus and method for generating a bandwidth extended signal
CN102208188B (en) * 2011-07-13 2013-04-17 华为技术有限公司 Audio signal encoding-decoding method and device
CN106847295B (en) * 2011-09-09 2021-03-23 松下电器(美国)知识产权公司 Encoding device and encoding method
EP3029672B1 (en) * 2012-02-23 2017-09-13 Dolby International AB Method and program for efficient recovery of high frequency audio content
CN108831501B (en) * 2012-03-21 2023-01-10 三星电子株式会社 High frequency encoding/decoding method and apparatus for bandwidth extension
JP5997592B2 (en) * 2012-04-27 2016-09-28 株式会社Nttドコモ Speech decoder
JP2016038435A (en) 2014-08-06 2016-03-22 ソニー株式会社 Encoding device and method, decoding device and method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011248378A (en) * 2004-05-19 2011-12-08 Panasonic Corp Encoder, decoder, and method therefor
JP2013178546A (en) * 2005-07-15 2013-09-09 Microsoft Corp Frequency segmentation for obtaining band for efficient coding of digital media

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3179476A4 *

Also Published As

Publication number Publication date
EP3608910A1 (en) 2020-02-12
US10049677B2 (en) 2018-08-14
US20180322885A1 (en) 2018-11-08
JP2016038435A (en) 2016-03-22
EP3179476A4 (en) 2018-01-03
US20170270940A1 (en) 2017-09-21
EP3179476B1 (en) 2019-10-09
US10510353B2 (en) 2019-12-17
EP3179476A1 (en) 2017-06-14
CN106663449B (en) 2021-03-16
CN106663449A (en) 2017-05-10
EP3608910B1 (en) 2021-08-25

Similar Documents

Publication Publication Date Title
US11705146B2 (en) Audio encoder and bandwidth extension decoder
US8639500B2 (en) Method, medium, and apparatus with bandwidth extension encoding and/or decoding
WO2016021412A1 (en) Coding device and method, decoding device and method, and program
KR101747918B1 (en) Method and apparatus for decoding high frequency signal
KR101375582B1 (en) Method and apparatus for bandwidth extension encoding and decoding
JP2010079275A (en) Device and method for expanding frequency band, device and method for encoding, device and method for decoding, and program
JP5651980B2 (en) Decoding device, decoding method, and program
JP2004053895A (en) Device and method for audio decoding, and program
CN110556122A (en) frequency band extension method, device, electronic equipment and computer readable storage medium
JPWO2015151451A1 (en) Encoding device, decoding device, encoding method, decoding method, and program
JP2004053940A (en) Audio decoding device and method
AU2015203736B2 (en) Audio encoder and bandwidth extension decoder
JP5892395B2 (en) Encoding apparatus, encoding method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15830713

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015830713

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 15500253

Country of ref document: US

Ref document number: 2015830713

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE