WO2021075167A1 - 量子化スケール係数決定装置、及び、量子化スケール係数決定方法 - Google Patents
量子化スケール係数決定装置、及び、量子化スケール係数決定方法 Download PDFInfo
- Publication number
- WO2021075167A1 WO2021075167A1 PCT/JP2020/033579 JP2020033579W WO2021075167A1 WO 2021075167 A1 WO2021075167 A1 WO 2021075167A1 JP 2020033579 W JP2020033579 W JP 2020033579W WO 2021075167 A1 WO2021075167 A1 WO 2021075167A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- quantization scale
- scale coefficient
- search
- spectrum
- sparsity
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/035—Scalar quantisation
Definitions
- the present disclosure relates to a quantization scale coefficient determining device and a quantization scale coefficient determining method.
- MDCT Modified Discrete Cosine Transform
- audio acoustic signal also called “audio acoustic signal”
- the MDCT spectrum is scaled (or referred to as quantization scaling), quantized, and arithmetically coded (see, for example, Patent Document 1).
- the non-limiting embodiment of the present disclosure contributes to the provision of a quantization scale coefficient determining device capable of reducing the amount of calculation in coding an audio signal or an acoustic signal, and a method for determining a quantization scale coefficient.
- the quantization scale coefficient determining device includes a correction circuit that corrects an initial value of the quantization scale coefficient based on whether or not the spectrum of the audio-acoustic signal has sparseness, and the initial value.
- the amount of calculation in coding an audio signal or an acoustic signal can be reduced.
- Block diagram showing a configuration example of an audio signal or acoustic signal transmission system Block diagram showing a configuration example of the TCX coding unit Block diagram showing a configuration example of the rate loop processing unit and the quantization / coding unit Block diagram showing a configuration example of the sparse analysis unit
- the figure which shows an example of the spectrum which has sparsity The figure which shows an example of the correction processing of the quantization scale coefficient based on sparsity.
- the figure which shows an example of the judgment condition of sparsity The figure which shows an example of the search process of the quantization scale coefficient.
- Patent Document 1 for example, the value obtained by multiplying the envelope (in other words, the envelope) of the MDCT spectrum obtained based on linear predictive analysis (for example, linear prediction coding (LPC) analysis) by the absolute value of the MDCT spectrum.
- linear predictive analysis for example, linear prediction coding (LPC) analysis
- LPC linear prediction coding
- RMS root mean square
- the coding device performs a search process for the quantization scale coefficient based on the initial value of the quantization scale coefficient. For example, a coding device estimates the amount of bits consumed by arithmetic coding of an MDCT spectrum (for example, referred to as "bit amount consumed") from an approximate expression based on a quantization scale coefficient. Then, the encoding device compares the estimated consumption bit amount with the target bit amount, and obtains, for example, a quantization scale coefficient that satisfies the conditions of "not exceeding the target bit amount" and "closest to the target bit amount”. , Search according to the dichotomy method.
- the binary search method is known to be a slow convergence method.
- FIG. 1 shows a configuration example of a voice signal or acoustic signal transmission system according to the present embodiment.
- the transmission system shown in FIG. 1 includes, for example, a coding device 1 and a decoding device 2.
- the coding device 1 encodes an input signal such as an audio signal or an acoustic signal, and transmits the coded data to the decoding device 2 via a communication network or a storage medium (not shown).
- the encoder 1 is a various audio-acoustic codecs (for example, ITU-T) defined in standards such as Moving Picture Experts Group (MPEG), 3rd Generation Partnership Project (3GPP), or International Telecommunication Union Telecommunication Standardization Sector (ITU-T).
- An encoder may be provided.
- the decoding device 2 decodes the coded data received from the coding device 1 via, for example, a transmission line or a storage medium, and outputs an output signal (for example, an electric signal).
- the decoding device 2 may output, for example, an electric signal as a sound wave via a speaker or headphones. Further, the decoding device 2 may use, for example, a decoder corresponding to the above-mentioned audio-acoustic codec.
- the codec in the coding apparatus 1 may include, for example, transformed code excitation (TCX) coding, which is one of frequency domain coding.
- TCX transformed code excitation
- the coding device 1 shown in FIG. 1 includes a TCX coding unit 10 that performs TCX coding processing.
- TCX coding may be applied to coding in low bit rate transmissions such as 13.2 kbps or 16.4 kbps.
- the transmission bit rate to which TCX coding is applied is not limited to 13.2 kbps and 16.4 kbps, and may be another bit rate.
- TCX coding using MDCT to code the excitation signal is sometimes called, for example, "MDCT based TCX".
- FIG. 2 shows a configuration example of the TCX coding unit 10 included in the coding device 1 shown in FIG.
- the TCX coding unit 10 shown in FIG. 2 includes, for example, an envelope generation unit 11, a harmonics analysis unit 12, an envelope scaling unit 13, a rate loop processing unit 14, and a quantization / coding unit 15.
- a frequency domain signal (hereinafter referred to as "MDCT spectrum") obtained by MDCT with respect to the input signal and an LPC coefficient obtained by LPC analysis with respect to the input signal are input to the envelope generation unit 11.
- the envelope generation unit 11 generates an envelope (in other words, an envelope) of the MDCT spectrum based on, for example, the LPC coefficient.
- the envelope generation unit 11 outputs the envelope information indicating the generated envelope and the spectrum information indicating the MDCT spectrum to the harmonics analysis unit 12.
- the harmonics analysis unit 12 analyzes the harmonics structure (in other words, harmonic components) in the MDCT spectrum based on the information input from the envelope generation unit 11, for example.
- the harmonics analysis unit 12 outputs, for example, harmonics information, envelope information, and spectral information indicating the analysis result of the harmonics structure to the envelope scaling unit 13.
- the harmonics information may include information indicating whether or not the MDCT spectrum has a harmonics structure (for example, referred to as a "harmonics flag” or a "harmonic model flag”).
- the harmonics information may include, for example, an index (for example, referred to as “harmonics gain index”) indicating the gain of harmonics (in other words, the gain of harmonics).
- the harmonics gain index may be, for example, a value obtained by indexing (in other words, quantizing) the harmonics gain for each level. For example, the higher the value of the harmonics gain index, the higher the harmonics gain level may be.
- the envelope scaling unit 13 performs scaling processing on the envelope of the MDCT spectrum based on the information input from the harmonics analysis unit 12, for example.
- the envelope scaling unit 13 outputs envelope information, harmonics information, and spectrum information indicating the scaled envelope to the rate loop processing unit 14.
- the rate loop processing unit 14 performs rate loop processing (also referred to as quantization rate loop processing) based on the information input from the envelope scaling unit 13, and calculates the quantization scale coefficient in the quantization of the MDCT spectrum. ..
- the rate loop processing unit 14 searches for a quantization scale coefficient based on, for example, a comparison between the amount of consumed bits and the amount of target bits.
- the search method may be, for example, a binary search method or another search method.
- the rate loop processing unit 14 may set the initial value of the quantization scale coefficient in the search based on the sparsity in the MDCT spectrum, for example. An example of how to set the initial value of the quantization scale coefficient in the rate loop processing unit 14 will be described later.
- the rate loop processing unit 14 outputs the information indicating the searched quantization scale coefficient and the spectral information to the quantization / coding unit 15.
- the quantization / coding unit 15 quantizes and encodes the MDCT spectrum based on the information input from the rate loop processing unit 14, and outputs the obtained coded data.
- FIG. 3 shows a configuration example of the rate loop processing unit 14 (for example, corresponding to the quantization scale coefficient determining device) and the quantization / coding unit 15 included in the TCX coding unit 10 shown in FIG.
- the rate loop processing unit 14 shown in FIG. 3 is, for example, a quantization scale coefficient calculation unit 141 (for example, corresponding to a calculation circuit), a sparse analysis unit 142, and a quantization scale coefficient search unit 143 (for example, corresponding to a search circuit). ) Is provided. Further, the quantization / coding unit 15 shown in FIG. 3 includes, for example, a quantization unit 151 and a coding unit 152.
- the quantization scale coefficient calculation unit 141 quantizes the MDCT spectrum in the quantization processing based on, for example, the envelope information input from the envelope scaling unit 13 and the spectrum information. Calculate the initial value of the scale factor.
- the quantization scale coefficient calculation unit 141 is a standard of the multiplication value (in other words, the amplitude spectrum normalized by the spectrum inclusion) of the envelope (for example, the envelope obtained based on the LPC analysis) and the absolute value of the MDCT spectrum.
- the inverse of the deviation may be set to the initial value of the quantization scale factor (or sometimes referred to as the "uncorrected quantization scale factor").
- the quantization scale coefficient calculation unit 141 outputs information indicating the quantization scale coefficient before correction to the sparse analysis unit 142.
- the calculation method of the quantization scale coefficient in the quantization scale coefficient calculation unit 141 is not limited to the above-mentioned method.
- the quantization scale coefficient calculation unit 141 may set the reciprocal of the dispersion of the multiplication value of the envelope and the absolute value of the MDCT spectrum as the initial value of the quantization scale coefficient.
- the quantization scale coefficient calculation unit 141 calculates the reciprocal of the root mean square squared value (or this reciprocal may be multiplied by a predetermined coefficient) with respect to the multiplication value of the envelope and the MDCT spectrum. It may be set to the initial value of.
- the sparse analysis unit 142 analyzes (in other words, determines) the sparseness of the MDCT spectrum based on at least one of harmonics information, spectrum information, and envelope information, for example.
- “Sparsity” is, for example, a property in which a small number of spectra (components) are non-zero and a large number of spectra (components) are zero (or components whose amplitude is less than a threshold value) in the distribution of MDCT spectra. is there.
- sparsity is, for example, a state in which a small number of spectra occupy a larger proportion of the spectral amplitudes (for example, a sum of amplitudes of 50% or more) in the sum of the spectral amplitudes.
- the sparsity analysis unit 142 may determine, for example, whether or not to correct the quantization scale coefficient input from the quantization scale coefficient calculation unit 141 based on the analysis result of sparsity.
- the sparse analysis unit 142 determines the correction of the quantization scale coefficient
- the sparse analysis unit 142 corrects the quantization scale coefficient and outputs information indicating the corrected quantization scale coefficient to the quantization scale coefficient search unit 143.
- the sparse analysis unit 142 does not correct the quantization scale coefficient, the sparse analysis unit 142 outputs information indicating the quantization scale coefficient input from the quantization scale coefficient calculation unit 141 to the quantization scale coefficient search unit 143.
- the quantization scale coefficient search unit 143 searches for the quantization scale coefficient based on the initial value of the quantization scale coefficient input from the sparse analysis unit 142. Then, the quantization scale coefficient search unit 143 performs a dichotomy based on the comparison result between the consumption bit amount estimated for arithmetic coding and the target bit amount, and information indicating the quantization scale coefficient after the search. Is output to the quantization / coding unit 15 (quantization unit 151).
- the quantization unit 151 quantizes the MDCT spectrum based on the quantization scale coefficient input from the quantization scale coefficient search unit 143.
- the quantization unit 151 outputs information indicating the MDCT spectrum after quantization to the coding unit 152.
- the coding unit 152 encodes the quantized MDCT spectrum input from the quantization unit 151 and outputs the coded data.
- the coding method in the coding unit 152 may be, for example, arithmetic coding or other coding.
- FIG. 4 shows a configuration example of the sparse analysis unit 142.
- the sparse analysis unit 142 shown in FIG. 4 includes, for example, a preprocessing unit 1421 (for example, corresponding to a preprocessing circuit), a sparseness determination unit 1422 (for example, corresponding to a determination circuit), and a quantization scale coefficient correction unit 1423 (corresponding to a determination circuit). For example, it corresponds to a correction circuit).
- the preprocessing unit 1421 performs preprocessing on, for example, the quantization scale coefficient (for example, the quantization scale coefficient (initial value) before correction) input from the quantization scale coefficient calculation unit 141.
- the preprocessing unit 1421 may adjust, for example, the upper limit of the quantization scale coefficient. Further, the preprocessing unit 1421 may multiply the quantization scale coefficient by a specific value (for example, a value less than 1.00).
- the preprocessing unit 1421 outputs information indicating the quantization scale coefficient after the preprocessing to the sparsity determination unit 1422.
- the sparsity determination unit 1422 determines whether or not the MDCT spectrum has sparsity. For example, the sparsity determination unit 1422 may determine the sparsity of the MDCT spectrum based on the envelope information, the harmonics information, and the information about the MDCT spectrum (for example, the absolute value of the MDCT spectrum).
- FIGS. 5 (a) to 5 (d) show an example of the MDCT spectrum in the case of having sparsity.
- the horizontal axis represents the frequency (for example, frequency bin), and the vertical axis represents the amplitude of the MDCT spectrum (for example, the absolute value of the amplitude).
- an MDCT spectrum having a harmonics structure peaks of the MDCT spectrum appear intensively at certain intervals, for example, as shown in FIG. 5 (a) or FIG. 5 (b).
- the MDCT spectrum at one interval in other words, the peak component
- the MDCT spectrum at another frequency in other words, a component different from the peak.
- an MDCT spectrum having a harmonic structure may have sparsity.
- energy may be concentrated in a part of the MDCT spectrum.
- some MDCT spectra where energy is concentrated may have higher amplitude (or power) than other MDCT spectra. Therefore, as shown in FIG. 5 (c) or FIG. 5 (d), the MDCT spectrum in which the energy is concentrated in a part of the spectrum may have sparsity.
- the sparsity determination unit 1422 may determine the sparsity based on, for example, harmonics information. Further, the sparsity determination unit 1422 may determine the sparsity based on, for example, the number of spectra that occupy a ratio of a threshold value (for example, 50%) or more in the MDCT spectrum (in other words, an audio signal or an acoustic signal). Further, the sparsity determination unit 1422 may determine the sparsity based on, for example, an envelope based on LPC analysis and an MDCT spectrum (for example, an absolute value). The determination of sparsity is not limited to at least one parameter (or feature amount) of harmonics information, envelope information, and MDCT spectrum (for example, absolute value), and may be determined based on other parameters. ..
- the quantization scale coefficient correction unit 1423 corrects the initial value of the quantization scale coefficient based on, for example, whether or not the MDCT spectrum has sparsity. For example, the quantization scale coefficient correction unit 1423 corrects the quantization scale coefficient (initial value) when there is sparsity in the MDCT spectrum. On the other hand, the sparse analysis unit 142 does not correct the quantization scale coefficient, for example, when there is no sparse property in the MDCT spectrum.
- the quantization scale coefficient correction unit 1423 outputs the obtained quantization scale coefficient to the quantization / coding unit 15 (for example, FIG. 3).
- the quantization scale coefficient calculation unit 141 for example, the standard deviation with respect to the multiplication value of the envelope obtained based on the LPC analysis (in other words, the scaled envelope) and the absolute value of the MDCT spectrum.
- the reciprocal is determined by the quantization scale factor.
- the mean value of the MDCT spectrum can be lower.
- the energy or average amplitude (for example, corresponding to the above standard deviation) of the entire MDCT spectrum can be estimated to be lower than when it does not have sparsity. Therefore, for example, when the MDCT spectrum has sparseness, the quantization scale coefficient (for example, the inverse of the above standard deviation) determined by the quantization scale factor calculation unit 141 does not have sparseness. It can be a larger value than the quantization scale coefficient or the quantization scale coefficient after the search.
- FIG. 6 shows an example of correction processing of the quantization scale coefficient based on sparsity.
- FIG. 6 shows a quantization scale coefficient when the MDCT spectrum has sparseness (in other words, a quantization scale coefficient before correction) and a quantization scale coefficient after search (in other words, a quantization scale after correction). An example of the correspondence with the coefficient) is shown.
- the horizontal axis represents the quantization scale coefficient after the search (for example, binary search), and the vertical axis represents the quantization scale coefficient input to the sparseness determination unit 1422.
- the quantization scale coefficient input to the sparseness determination unit 1422 may be, for example, the quantization scale coefficient calculated by the quantization scale coefficient calculation unit 141 or the quantization scale coefficient adjusted by the preprocessing unit 1421.
- the quantization scale factor correction unit 1423 determines the quantization scale coefficient (for example, scl_b) before correction. , Correct (reduce) to the quantization scale factor (eg, scl_a).
- the method of correcting the quantization scale coefficient is based on the statistical relationship (for example, simulation result) between the quantization scale coefficient when there is sparseness and the quantization scale coefficient after the search. It may be set based on.
- the parameter "1.85" is an example and is not limited to this value. Further, the method for correcting the quantization scale coefficient is not limited to the above method, and other methods may be used.
- the quantization scale coefficient search unit 143 can start the search based on the initial value of the corrected quantization scale coefficient. For example, in FIG. 6, the quantization scale coefficient search unit 143 sets the corrected quantization scale coefficient scl_a as an initial value and performs a binary search. By this search, the quantization scale coefficient search unit 143, for example, sets the uncorrected quantization scale coefficient scl_b shown in FIG. 6 as an initial value and performs a binary search, and compares it with a case where a binary search is performed, and obtains a convergence value by the binary search. The number of searches until it is obtained, that is, the amount of calculation can be reduced.
- the sparsity determination unit 1422 determines the sparsity based on whether or not the MDCT spectrum has a "harmonics structure" as shown in FIG. 5 (a) or FIG. 5 (b).
- the sparseness determination unit 1422 may determine the sparseness based on the harmonics flag, the harmonics gain index, and the average value of the absolute values of the MDCT spectrum (hereinafter, referred to as “spectral average value”). ..
- the sparseness determination unit 1422 when the harmonics flag is ON (in other words, when it has a harmonics structure) and when the harmonics gain index is equal to or greater than the threshold value (in other words, when the harmonics gain is equal to or greater than the threshold value). ), And when the number of spectra exceeding the spectral average value (in other words, also referred to as frequency bin or line) is less than the threshold value, it may be determined that the MDCT spectrum has sparseness.
- the sparsity determination unit 1422 may determine that the MDCT spectrum does not have sparsity.
- a plurality of threshold values for the harmonics gain index may be set. Further, in the determination condition 1, a plurality of threshold values for the number of spectra exceeding the spectral average value may be set.
- the harmonics flag is ON
- the number of spectra exceeding the spectral average value is the threshold value “.
- the harmonics flag is ON
- the number of spectra exceeding the spectral average value is the threshold value.
- the case of less than "Y2" (for example, Y2 85) is shown.
- the values of the threshold values X1, X2, Y1 and Y2 are examples, and are not limited to these values. Further, here, the case where the sparsity is determined based on any of the conditions of the two patterns of the combination of X1 and Y1 and the combination of X2 and Y2 has been described, but the present invention is not limited to this.
- the combination pattern of the threshold value X regarding the harmonics gain index and the threshold value Y for the number of spectra exceeding the spectral average value may be one pattern or three or more patterns.
- the sparsity determination unit 1422 sparses based on the number of spectra in which the MDCT spectrum occupies a ratio equal to or more than the threshold value (for example, also referred to as “composition ratio”) in the MDCT spectrum as shown in FIG. Determine sex.
- the threshold value for example, also referred to as “composition ratio”
- the sparsity determination unit 1422 may determine that the MDCT spectrum has sparsity when the number of spectra occupying a composition ratio of the threshold value (for example, 50%) or more in the MDCT spectrum is the threshold value L1 or less.
- the sparseness determination unit 1422 has a case where the number of spectra occupying a composition ratio of the threshold value (for example, 50%) or more in the MDCT spectrum is the threshold value L1 or less, and the root mean square of the absolute value of the MDCT spectrum (in other words, , Power mean value or mean amplitude) may be determined to have sparseness when the number of spectra exceeds the threshold value L2.
- the threshold value for example, 50%
- the root mean square of the absolute value of the MDCT spectrum in other words, Power mean value or mean amplitude
- the sparseness determination unit 1422 may determine that the sparseness is not present.
- determination condition 2 may be applied, for example, when the MDCT spectrum does not have a harmonic structure (an example will be described later).
- the sparsity determination unit 1422 is based on the number of spectra in which the MDCT spectrum occupies a ratio (or composition ratio) equal to or greater than the threshold value in the MDCT spectrum, as shown in FIG. 5D. , Judge sparseness.
- the sparseness determination unit 1422 is the ratio of the "maximum value of the multiplication value of the envelope and the absolute value of the MDCT spectrum" and the "root mean square". The sparseness may be determined based on.
- the sparseness determination unit 1422 states that when the number of spectra occupying a composition ratio equal to or more than a threshold value (for example, 50%) in the MDCT spectrum is the threshold value L1 or less, and “the maximum value of the multiplication value between the envelope and the absolute value of the MDCT spectrum When the ratio of "value" and "root mean square" is equal to or greater than the threshold value L2, it may be determined that the MDCT spectrum has sparseness.
- a threshold value for example, 50%
- the ratio of the "maximum value of the multiplication of the envelope to the absolute value of the MDCT spectrum” and the "root mean square” is less than the threshold L2
- the power (or amplitude) with respect to the maximum peak power (or amplitude) The ratio of the average value of (amplitude) can be large. Therefore, it is highly possible that the power (or amplitude) of the maximum peak is not concentrated (in other words, dispersed) in a part of the spectrum, so that the sparsity determination unit 1422 must have sparsity. You may judge.
- the values of the parameter k and the threshold values L1 and L2 are examples, and are not limited to these values.
- the ratio is not limited to 50% and may be another ratio.
- the judgment conditions 1 to 3 have been described above.
- the determination condition 1 to the determination condition 3 may be combined.
- the determination condition of sparsity is not limited to the determination condition 1 to the determination condition 2, and other determination conditions may be used.
- the sparsity determination unit 1422 switches the determination condition for determining the sparsity of the MDCT spectrum based on, for example, the uncorrected quantization scale coefficient (in other words, the initial value before correction) calculated based on the MDCT spectrum. You may.
- FIG. 7 shows an example of switching the determination conditions in the sparsity determination unit 1422.
- the sparseness determination unit 1422 applies the determination condition 1 and the determination condition 2 before the correction.
- the determination condition 3 may be applied.
- the threshold n1 may be determined, for example, based on whether or not it is a quantization scale coefficient corresponding to an MDCT spectrum that may have a harmonic structure. For example, the larger the peak amplitude value of the MDCT spectrum and the smaller the average value of the MDCT spectrum amplitude, the more likely the MDCT spectrum has a harmonic structure. Therefore, for example, when the quantization scale coefficient before correction is less than the threshold value n1 (in other words, when the peak amplitude value of the MDCT spectrum is large and the average value of the MDCT spectrum amplitude is small), the sparseness determination unit 1422 sparses. At the time of sex determination, it may be determined whether or not it has a harmonic structure.
- the sparseness determination unit 1422 when the quantization scale coefficient before correction is the threshold value n1 or more (in other words, the peak amplitude value of only a few MDCT spectra is large and the average value of the MDCT spectrum amplitude is small). In the case), it is not necessary to determine whether or not it has a harmonic structure when determining the sparseness.
- the threshold value n2 may be determined based on, for example, the lower limit of the amplitude level of the MDCT spectrum scaled by the quantization scale coefficient.
- the quantization scale coefficient may be set so that the MDCT spectrum is quantized at 0 without setting the quantization scale coefficient larger.
- the MDCT spectrum amplitude level near 0 is forcibly quantized with a value larger than 0, the MDCT spectrum can be overscaled depending on the setting of the quantization scale coefficient.
- the upper limit of the quantization scale coefficient in other words, the lower limit of the amplitude level at which the MDCT spectrum is quantized, is set by setting the threshold value n2.
- the threshold value n2 for example, when the amplitude level of the MDCT spectrum is near 0, it is possible to prevent a larger quantization scale coefficient from being set, so that excessive scaling of the MDCT spectrum can be suppressed.
- the sparsity determination unit 1422 does not have to determine the sparsity.
- the correction value of the quantization scale coefficient is not limited to the threshold value n2, and may be another value (for example, 0.05).
- the sparsity determination unit 1422 switches the sparsity determination conditions based on the quantization scale coefficient (in other words, the MDCT spectrum amplitude level) before correction.
- the sparsity determination unit 1422 can determine the sparsity according to the characteristics of the MDCT spectrum (for example, the amplitude level or the presence or absence of the harmonic structure), so that the sparsity determination accuracy can be improved. ..
- the values of the threshold values n1 and n2 are examples, and other values may be used. Further, the threshold value may be one or three or more.
- the initial value of the quantization scale coefficient is corrected based on whether or not the MDCT spectrum of the audio signal or the acoustic signal has sparseness, and the initial value is corrected.
- the quantization scale factor is searched based on.
- the initial value of the quantization scale coefficient is corrected to a value closer to, for example, the quantization case coefficient obtained in the binary search.
- the quantization scale coefficient search unit 143 (for example, FIG. 3) may perform the search process shown in FIG.
- the quantization scale coefficient search unit 143 may calculate the quantization scale coefficient (for example, expressed as “nx scl ”) in the next search based on the equation (1), for example.
- t bit represents the target bit amount
- bf bit represents the amount of bits consumed estimated for the arithmetic coding of the MDCT spectrum in the previous search
- cr bit represents the arithmetic code of the MDCT spectrum in this search. Represents the estimated amount of bits consumed for conversion.
- bf scl represents the quantization scale coefficient in the previous search
- cr scl represents the quantization scale coefficient in the current search.
- the quantization scale coefficient search unit 143 determines the difference n between the consumption bit amount cr bit and the target bit amount t bit estimated for the arithmetic coding of the MDCT spectrum in this search, and the previous time. Based on the difference m between the estimated bit consumption bf bit and the target bit amount t bit for the arithmetic coding of the MDCT spectrum in the search of , the quantization scale coefficient n x scl in the next time is determined. Note that nx scl satisfies "bf scl ⁇ nx scl ⁇ cr scl " or "cr scl ⁇ nx scl ⁇ bf scl ".
- the quantization scale coefficient search unit 143 determines the quantization scale coefficient used in each search based on the difference (for example, m and n) between the amount of consumed bits estimated in each search and the amount of target bits. Weighting is performed.
- the quantization scale coefficient search unit 143 sets a larger weight on the quantization scale coefficient cr scl at the time of the current search than at the quantization scale coefficient bf scl at the time of the previous search (for example,
- the quantization scale coefficient for the next search obtained by weighting is wg scl
- the quantization scale coefficient for the next search obtained by the binary search is bi scl (in the case of the binary search method, the weighting coefficient bi scl is 0.5.
- the quantization scale coefficient search unit 143 may determine the quantization scale coefficient n x scl at the time of the next search by the weighted sum of both. The weighting factor of this weighting may be changed for each search.
- nx scl 1 x wg scl + 0 x bi scl
- nx scl 0.75 x wg scl + 0.25 x bi scl
- nx scl 0.5 x wg scl + 0.5 x bi scl
- nx scl 0.25 x wg
- the weight may be increased or decreased by 0.25 each time, such as scl + 0.75 ⁇ bi scl
- nx scl 0 ⁇ wg scl +1 ⁇ bi scl , which is the same as the dichotomy method.
- nx scl is expressed by equation (2).
- the search to be compared with the amount of bits consumed in this search is not limited to the previous search (in other words, the previous search), but may be a search before the previous search.
- the search in which the quantization scale coefficient is determined based on a plurality of searches is not limited to the next search (in other words, the search after one), and may be a search after the next search.
- the search to be compared with the consumption bit amount in the current search is not limited to one search in the past, and the consumption bit amount in a plurality of past searches may be used.
- the preprocessing unit 1421 adjusts (in other words, in other words) the upper limit value of the quantization scale coefficient (initial value) in addition to the above-mentioned operation (for example, adjustment of the quantization scale coefficient). , Limit).
- the sparsity determination unit 1422 may determine the sparsity based on the output of the preprocessing unit 1421 (quantization scale coefficient with an adjusted upper limit).
- the preprocessing unit 1421 may set the threshold value n2 shown in FIG. 7 as the upper limit value.
- the threshold value n2 shown in FIG. 7
- the upper limit value of the quantization scale coefficient is adjusted to n2 in the preprocessing unit 1421, the quantization scale coefficient larger than the threshold value n2 is not input to the sparseness determination unit 1422, so that the sparseness determination (for example, FIG. In 7), the threshold value n2 does not have to be set.
- the upper limit of the quantization scale coefficient in the preprocessing unit 1421 may be a value different from the threshold value n2.
- the coding device 1 determines, for example, that the MDCT spectrum has sparseness, and when the number of spectra occupying the composition ratio of the threshold value (for example, 50%) is equal to or less than the threshold value, the quantized MDCT spectrum is used.
- Pulse coding may be performed instead of arithmetic coding. By this processing, the coding efficiency can be improved.
- the coding unit 152 shown in FIG. 3 may include, for example, a switching unit for switching the coding method, an arithmetic coding unit, and a pulse coding unit. Further, the coding device 1 may generate, for example, information indicating a coding method applied to the coding of the MDCT spectrum and transmit the information to the decoding device 2.
- the decoding device 2 corresponds to a plurality of coding methods including, for example, arithmetic coding and pulse coding, and the decoding device 2 can specify the coding method in the coding device 1, the coding method is used. The indicated information does not have to be notified to the decoding device 2.
- Each functional block used in the description of the above embodiment is partially or wholly realized as an LSI which is an integrated circuit, and each process described in the above embodiment is partially or wholly. It may be controlled by one LSI or a combination of LSIs.
- the LSI may be composed of individual chips, or may be composed of one chip so as to include a part or all of functional blocks.
- the LSI may include data input and output.
- LSIs may be referred to as ICs, system LSIs, super LSIs, and ultra LSIs depending on the degree of integration.
- the method of making an integrated circuit is not limited to LSI, and may be realized by a dedicated circuit, a general-purpose processor, or a dedicated processor. Further, an FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, or a reconfigurable processor that can reconfigure the connection and settings of the circuit cells inside the LSI may be used.
- FPGA Field Programmable Gate Array
- the present disclosure may be realized as digital processing or analog processing.
- the communication device may include a wireless transceiver and a processing / control circuit.
- the wireless transmitter / receiver may include a receiver and a transmitter, or those as functions.
- the radio transmitter / receiver (transmitter, receiver) may include an RF (Radio Frequency) module and one or more antennas.
- RF modules may include amplifiers, RF modulators / demodulators, or the like.
- Non-limiting examples of communication devices include telephones (mobile phones, smartphones, etc.), tablets, personal computers (PCs) (laptops, desktops, notebooks, etc.), cameras (digital stills / video cameras, etc.).
- Digital players digital audio / video players, etc.
- wearable devices wearable cameras, smart watches, tracking devices, etc.
- game consoles digital book readers
- telehealth telemedicines remote health Care / medicine prescription
- vehicles with communication functions or mobile transportation automobiles, airplanes, ships, etc.
- combinations of the above-mentioned various devices can be mentioned.
- Communication devices are not limited to those that are portable or mobile, but are all types of devices, devices, systems that are not portable or fixed, such as smart home devices (home appliances, lighting equipment, smart meters or Includes measuring instruments, control panels, etc.), vending machines, and any other "Things” that can exist on the IoT (Internet of Things) network.
- smart home devices home appliances, lighting equipment, smart meters or Includes measuring instruments, control panels, etc.
- vending machines and any other “Things” that can exist on the IoT (Internet of Things) network.
- Communication includes data communication using a combination of these, in addition to data communication using a cellular system, wireless LAN system, communication satellite system, etc.
- the communication device also includes a device such as a controller or a sensor that is connected or connected to a communication device that executes the communication function described in the present disclosure.
- a device such as a controller or a sensor that is connected or connected to a communication device that executes the communication function described in the present disclosure.
- it includes controllers and sensors that generate control and data signals used by communication devices that perform the communication functions of the communication device.
- Communication devices also include infrastructure equipment that communicates with or controls these non-limiting devices, such as base stations, access points, and any other device, device, or system. ..
- the quantization scale coefficient determining device includes a correction circuit that corrects an initial value of the quantization scale coefficient based on whether or not the spectrum of the audio-acoustic signal has sparseness, and the initial value.
- a determination circuit for determining whether or not it has the sparsity is further provided.
- the determination circuit determines the sparsity based on the harmonic structure of the spectrum.
- the determination circuit determines the sparsity based on the number of spectra that occupy a ratio equal to or greater than a threshold value in the audio-acoustic signal.
- the determination circuit determines the sparsity based on the absolute value of the spectrum and the envelope of the spectrum.
- the determination circuit switches the conditions for determining the sparsity based on the initial value before correction calculated based on the spectrum.
- a preprocessing circuit for adjusting the upper limit of the initial value is further provided, and the determination circuit determines the sparsity based on the output of the preprocessing circuit.
- the search circuit comprises a difference between an estimated bit amount consumed and a target bit amount for coding the spectrum in the first search, and a second search prior to the first search.
- the quantization scale coefficient in the third search after the first search is determined based on the difference between the consumption bit amount estimated for the coding of the spectrum in the search and the target bit amount.
- a calculation circuit for calculating the initial value based on either the dispersion of the spectral amplitude of the audio-acoustic signal or the standard deviation is further provided.
- the quantization scale coefficient determining device corrects the initial value of the quantization scale coefficient based on whether or not the spectrum of the audio-acoustic signal has sparseness. Then, the quantization scale coefficient is searched based on the initial value.
- One embodiment of the present disclosure is useful for a voice signal or acoustic signal transmission system or the like.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
図1は、本実施の形態に係る音声信号又は音響信号の伝送システムの構成例を示す。
図2は、図1に示す符号化装置1に含まれるTCX符号化部10の構成例を示す。図2に示すTCX符号化部10は、例えば、エンベロープ生成部11、ハーモニクス解析部12、エンベロープスケーリング部13、レートループ処理部14、及び、量子化・符号化部15を備える。
図3は、図2に示すTCX符号化部10に含まれるレートループ処理部14(例えば、量子化スケール係数決定装置に相当)及び量子化・符号化部15の構成例を示す。
図4は、スパース解析部142の構成例を示す。
次に、スパース性判定部1422におけるMDCTスペクトルがスパース性を有するか否かを判定する条件(判定方法)の一例について説明する。
判定条件1では、スパース性判定部1422は、MDCTスペクトルが図5(a)又は図5(b)のように、「ハーモニクス構造」を有するか否かに基づいて、スパース性を判定する。
判定条件2では、スパース性判定部1422は、MDCTスペクトルが図5(c)のように、MDCTスペクトルにおいて閾値以上の割合(例えば、「構成比」とも呼ぶ)を占めるスペクトル数に基づいて、スパース性を判定する。
判定条件3では、スパース性判定部1422は、判定条件2と同様、MDCTスペクトルが図5(d)のように、MDCTスペクトルにおいて閾値以上の割合(又は、構成比)を占めるスペクトル数に基づいて、スパース性を判定する。
バリエーション1では、量子化スケール係数探索部143(例えば、図3)は、図8に示す探索処理を行ってもよい。
図4に示すスパース解析部142において、前処理部1421は、上述した動作(例えば、量子化スケール係数の調整)に加え、例えば、量子化スケール係数(初期値)の上限値を調整(換言すると、リミット)してもよい。この場合、スパース性判定部1422は、前処理部1421の出力(上限値が調整された量子化スケール係数)に基づいて、スパース性を判定してよい。
符号化装置1は、例えば、MDCTスペクトルがスパース性を有すると判定し、かつ、閾値(例えば、50%)の構成比を占めるスペクトル数が閾値以下の場合、量子化されたMDCTスペクトルに対して、算術符号化ではなく、パルス符号化を行ってもよい。この処理により、符号化効率を向上できる。
2 復号装置
10 TCX符号化部
11 エンベロープ生成部
12 ハーモニクス解析部
13 エンベロープスケーリング部
14 レートループ処理部
15 量子化・符号化部
141 量子化スケール係数計算部
142 スパース解析部
143 量子化スケール係数探索部
151 量子化部
152 符号化部
1421 前処理部
1422 スパース性判定部
1423 量子化スケール係数補正部
Claims (10)
- 音声音響信号のスペクトルがスパース性を有するか否かに基づいて、量子化スケール係数の初期値を補正する補正回路と、
前記初期値に基づいて、前記量子化スケール係数の探索を行う探索回路と、
を具備する量子化スケール係数決定装置。 - 前記スパース性を有するか否かを判定する判定回路、を更に具備する、
請求項1に記載の量子化スケール係数決定装置。 - 前記判定回路は、前記スペクトルのハーモニクス構造に基づいて、前記スパース性を判定する、
請求項2に記載の量子化スケール係数決定装置。 - 前記判定回路は、前記音声音響信号において閾値以上の割合を占めるスペクトル数に基づいて、前記スパース性を判定する、
請求項2に記載の量子化スケール係数決定装置。 - 前記判定回路は、前記スペクトルの絶対値、及び、前記スペクトルのエンベロープに基づいて、前記スパース性を判定する、
請求項2に記載の量子化スケール係数決定装置。 - 前記判定回路は、前記スペクトルに基づいて算出される補正前の前記初期値に基づいて、前記スパース性を判定する条件を切り替える、
請求項2に記載の量子化スケール係数決定装置。 - 前記初期値の上限値を調整する前処理回路、を更に具備し、
前記判定回路は、前記前処理回路の出力に基づいて、前記スパース性を判定する、
請求項2に記載の量子化スケール係数決定装置。 - 前記探索回路は、第1の探索における前記スペクトルの符号化について推定される消費ビット量と目標ビット量との差分、及び、前記第1の探索の前の第2の探索における前記スペクトルの符号化について推定される消費ビット量と前記目標ビット量との差分に基づいて、前記第1の探索の後の第3の探索における前記量子化スケール係数を決定する、
請求項1に記載の量子化スケール係数決定装置。 - 前記音声音響信号のスペクトル振幅の分散及び標準偏差の何れか一方に基づいて前記初期値を算出する算出回路、を更に具備する、
請求項1に記載の量子化スケール係数決定装置。 - 量子化スケール係数決定装置は、
音声音響信号のスペクトルがスパース性を有するか否かに基づいて、量子化スケール係数の初期値を補正し、
前記初期値に基づいて、前記量子化スケール係数の探索を行う、
量子化スケール係数決定方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021552264A JPWO2021075167A1 (ja) | 2019-10-16 | 2020-09-04 | |
US17/768,801 US20230025447A1 (en) | 2019-10-16 | 2020-09-04 | Quantization scale factor determination device and quantization scale factor determination method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019-189177 | 2019-10-16 | ||
JP2019189177 | 2019-10-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021075167A1 true WO2021075167A1 (ja) | 2021-04-22 |
Family
ID=75537592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/033579 WO2021075167A1 (ja) | 2019-10-16 | 2020-09-04 | 量子化スケール係数決定装置、及び、量子化スケール係数決定方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230025447A1 (ja) |
JP (1) | JPWO2021075167A1 (ja) |
WO (1) | WO2021075167A1 (ja) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009198612A (ja) * | 2008-02-19 | 2009-09-03 | Fujitsu Ltd | 符号化装置、符号化方法および符号化プログラム |
JP2016533515A (ja) * | 2013-10-18 | 2016-10-27 | テレフオンアクチーボラゲット エルエム エリクソン(パブル) | スペクトルピーク位置の符号化及び復号化 |
-
2020
- 2020-09-04 JP JP2021552264A patent/JPWO2021075167A1/ja active Pending
- 2020-09-04 WO PCT/JP2020/033579 patent/WO2021075167A1/ja active Application Filing
- 2020-09-04 US US17/768,801 patent/US20230025447A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009198612A (ja) * | 2008-02-19 | 2009-09-03 | Fujitsu Ltd | 符号化装置、符号化方法および符号化プログラム |
JP2016533515A (ja) * | 2013-10-18 | 2016-10-27 | テレフオンアクチーボラゲット エルエム エリクソン(パブル) | スペクトルピーク位置の符号化及び復号化 |
Also Published As
Publication number | Publication date |
---|---|
US20230025447A1 (en) | 2023-01-26 |
JPWO2021075167A1 (ja) | 2021-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102576542B (zh) | 从窄频带信号确定上频带信号的方法和设备 | |
CN101223582B (zh) | 一种音频编码方法、音频解码方法及音频编码器 | |
US8099275B2 (en) | Sound encoder and sound encoding method for generating a second layer decoded signal based on a degree of variation in a first layer decoded signal | |
US10121480B2 (en) | Method and apparatus for encoding audio data | |
US10643623B2 (en) | Audio signal coding apparatus, audio signal decoding apparatus, audio signal coding method, and audio signal decoding method | |
EP2831875B1 (en) | Bandwidth extension of harmonic audio signal | |
EP2863388B1 (en) | Bit allocation method and device for audio signal | |
CN104956438B (zh) | 执行噪声调制和增益调节的系统和方法 | |
CN111710342B (zh) | 编码装置、解码装置、编码方法、解码方法及程序 | |
JP5262171B2 (ja) | 符号化装置、符号化方法および符号化プログラム | |
EP2127088B1 (en) | Audio quantization | |
KR20070090217A (ko) | 스케일러블 부호화 장치 및 스케일러블 부호화 방법 | |
WO2021075167A1 (ja) | 量子化スケール係数決定装置、及び、量子化スケール係数決定方法 | |
US8438012B2 (en) | Method and apparatus for adaptive sub-band allocation of spectral coefficients | |
US8731081B2 (en) | Apparatus and method for combinatorial coding of signals | |
US8711012B2 (en) | Encoding method, decoding method, encoding device, decoding device, program, and recording medium | |
US20130096927A1 (en) | Audio coding device and audio coding method, audio decoding device and audio decoding method, and program | |
JP6179087B2 (ja) | オーディオ符号化装置、オーディオ符号化方法、オーディオ符号化用コンピュータプログラム | |
US20120263312A1 (en) | Rate controller, rate control method, and rate control program | |
WO2018052004A1 (ja) | サンプル列変形装置、信号符号化装置、信号復号装置、サンプル列変形方法、信号符号化方法、信号復号方法、およびプログラム | |
JP2002311997A (ja) | オーディオ信号符号化装置 | |
CN117715072A (zh) | 信息传输方法、ai网络模型训练方法、装置和通信设备 | |
CN116631418A (zh) | 语音编码、解码方法、装置、计算机设备和存储介质 | |
JPWO2020009082A1 (ja) | 符号化装置及び符号化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20875945 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021552264 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20/07/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20875945 Country of ref document: EP Kind code of ref document: A1 |