WO2007043648A1 - 変換符号化装置および変換符号化方法 - Google Patents
変換符号化装置および変換符号化方法 Download PDFInfo
- Publication number
- WO2007043648A1 WO2007043648A1 PCT/JP2006/320457 JP2006320457W WO2007043648A1 WO 2007043648 A1 WO2007043648 A1 WO 2007043648A1 JP 2006320457 W JP2006320457 W JP 2006320457W WO 2007043648 A1 WO2007043648 A1 WO 2007043648A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- scale factor
- spectrum
- distortion
- unit
- weighted
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 65
- 238000012937 correction Methods 0.000 claims abstract description 125
- 238000001228 spectrum Methods 0.000 claims description 297
- 238000004364 calculation method Methods 0.000 claims description 92
- 238000001514 detection method Methods 0.000 claims description 23
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 abstract description 61
- 238000012545 processing Methods 0.000 abstract description 28
- 230000015556 catabolic process Effects 0.000 abstract description 8
- 238000006731 degradation reaction Methods 0.000 abstract description 8
- 239000010410 layer Substances 0.000 description 225
- 238000013139 quantization Methods 0.000 description 56
- 238000010586 diagram Methods 0.000 description 45
- 238000004458 analytical method Methods 0.000 description 33
- 230000001629 suppression Effects 0.000 description 33
- 230000003595 spectral effect Effects 0.000 description 28
- 230000000694 effects Effects 0.000 description 13
- 230000008569 process Effects 0.000 description 12
- 238000000926 separation method Methods 0.000 description 11
- 238000005070 sampling Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 230000006866 deterioration Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000000873 masking effect Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000005311 autocorrelation function Methods 0.000 description 3
- 238000005094 computer simulation Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 239000002356 single layer Substances 0.000 description 3
- 230000002238 attenuated effect Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000003313 weakening effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241001265525 Edgeworthia chrysantha Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
- G10L19/0208—Subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/038—Vector quantisation, e.g. TwinVQ audio
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
Definitions
- the present invention relates to a transform coding apparatus and transform coding method for encoding an input signal in a frequency domain.
- a first layer that encodes an input signal at a low bit rate with a model suitable for a speech signal, and a differential signal between the input signal and the first layer decoded signal is also suitable for a signal other than speech.
- a technique that hierarchically combines the second layer encoded by the model see Non-Patent Document 1, for example.
- an example of performing scalable coding using a technique specified by MPEG-4 (Moving Picture Experts Group phase-4) is shown.
- CELP Code Excited Linear Prediction
- AAC Ad is applied to the residual signal obtained by subtracting the first layer decoded signal from the original signal.
- Transform code weights such as vanced audio and oaer
- TwmVw Transform Domain Weighted Interleave Vector Quantization
- TwinVQ uses the MDCT coefficients as scale factors. It can be considered that this is a technique for performing the sign separation separately into a fine spectrum.
- Non-patent document 1 edited by Satoshi Miki, “All of MPEG-4 (First Edition)”, Industrial Research Committee, Inc., September 30, 1998, p. 126-127
- Non-Patent Document 2 Naoki Iwagami, Takehiro Moriya, Mitsumata, Kazunaga Ikeda, and Akio Kamin, “Musical Coding with Frequency Domain Weighted Interleaved Vector Quantization (TwinVQ)” Theory of Science (A), 199 May, vol.J80 -A, no.5, p.830-837
- the weight function w expressed by the above equation (1) is a Bark scale, that is, a function of frequency.
- Bark scale i is the same, the difference between the input scale factor and the quantization candidate (E—Ci (m) )
- the weights (weights) Wi to be multiplied are always the same.
- Wi represents a weight corresponding to the Bark scale, and is calculated based on the magnitude of the spectrum envelope.
- the average amplitude weight for a band with a small spectral envelope is a small value
- the average amplitude weight for a band with a large spectral envelope is a large value. Therefore, since the weight of the average amplitude with respect to the band having a large spectrum envelope is set to be large, as a result, this band is regarded as important and the coding is performed. Conversely, since the weight of the average amplitude for the band with a small spectral envelope is set to be small, the importance of this band is low.
- An object of the present invention is to provide a transform code key apparatus and a transform code key method capable of reducing deterioration in perceptual voice quality even when the number of bits is not sufficiently allocated. is there.
- a conversion code encoder includes an input scale factor calculation means for calculating a plurality of input scale factors corresponding to an input spectrum, and a codebook for storing a plurality of scale factors and outputting one scale factor.
- a distortion calculating means for calculating a distortion between one of the plurality of input scale factors and a scale factor output from the codebook, and the one input scale factor is output from the codebook.
- the weighted distortion is weighted more than the distortion of the scale factor.
- a weighted distortion calculating means for performing a scale factor that minimizes the weighted distortion in the codebook.
- Search means for searching for Kuta; The structure to comprise is taken.
- FIG. 1 is a block diagram showing the main configuration of a scalable code generator according to Embodiment 1.
- FIG. 2 shows the main configuration inside the second layer code generator according to Embodiment 1.
- Block diagram [FIG. 3] A block diagram showing the main components inside the correction scale factor code section according to the first embodiment.
- FIG. 4 is a block diagram showing the main configuration of the scalable decoding device according to Embodiment 1.
- FIG. 5 is a block diagram showing the main configuration inside the second layer decoding device according to Embodiment 1.
- FIG. 6 is a block diagram showing the main configuration inside the second layer code section according to Embodiment 2.
- FIG. 7 is a block diagram showing the main configuration inside the second layer decoding section according to Embodiment 2.
- FIG. 8 is a block diagram showing the main configuration inside the second layer code key section according to the third embodiment.
- FIG. 9 is a block diagram showing the main configuration of the transform code key apparatus according to the fourth embodiment.
- FIG. 10 is a block diagram showing the main components inside the scale factor code section according to the fourth embodiment.
- FIG. 11 is a block diagram showing the main configuration of a transform decoding apparatus according to Embodiment 4.
- FIG. 12 is a block diagram showing the main configuration of the scalable code base apparatus according to Embodiment 5.
- FIG. 13 is a block diagram showing the main configuration inside the second layer code base section according to Embodiment 5.
- FIG. 14 is a block diagram showing the main configuration inside the correction scale factor code key section according to the fifth embodiment.
- FIG. 15 is a block diagram showing the main configuration inside the second layer decoding key unit according to Embodiment 5.
- FIG. 16 shows the main configuration inside the second layer code key unit according to Embodiment 6.
- Block diagram [FIG. 17] A block diagram showing the main components inside the correction scale factor code section according to the sixth embodiment.
- FIG. 18 is a block diagram showing the main configuration of the scalable decoding device according to Embodiment 7.
- FIG. 19 is a block diagram showing the main configuration inside the modified LPC calculation unit according to Embodiment 7.
- FIG. 20 is a schematic diagram showing the signal band and voice quality of each layer according to Embodiment 7.
- FIG. 21 is a spectral characteristic diagram showing a state of power spectrum correction by the first realization method according to Embodiment 7.
- FIG. 22 is a spectral characteristic diagram showing the state of power spectrum correction by the second realization method according to the seventh embodiment.
- FIG. 23 is a spectral characteristic diagram of a post filter configured using the modified LPC coefficient according to the seventh embodiment.
- FIG. 24 is a block diagram showing the main configuration of the scalable decoding device according to Embodiment 8.
- FIG. 25 is a block diagram showing the main configuration inside the suppression information calculation unit according to Embodiment 8. Best form for
- scalable coding is a coding scheme having a hierarchical structure consisting of a plurality of layers, and is characterized in that the coding parameters generated in each layer have scalability.
- a certain level of decoded signal can be obtained, and decoding is performed using more layers of code key parameters. If a decoded signal with higher quality is obtained by performing the above, it has a characteristic.
- Embodiments 1 to 3 and 5 to 8 describe cases in which the present invention is applied to scalable coding, and Embodiment 4 uses the present invention for a code that has a single layer power. The case where it applies is explained. In Embodiments 1 to 3 and 5 to 8, the following cases will be described as examples.
- the second layer performs coding in the frequency domain, that is, transform code, and uses MDCT (Modified Discrete Cosine Transform) as a transform method.
- MDCT Modified Discrete Cosine Transform
- FIG. 1 is a block diagram showing the main configuration of a scalable coding apparatus including a transform coding apparatus according to Embodiment 1 of the present invention.
- the scalable coding apparatus includes a downsampling unit 101, a first layer coding unit 102, a multiplexing unit 103, a first layer decoding unit 104, a delay unit 105, and a first coding unit.
- a two-layer code key unit 106 is provided, and each unit performs the following operations.
- Down-sampling section 101 generates a signal of sampling rate F 1 (F 1 ⁇ F 2) from the input signal of sampling rate F 2, and provides it to first layer coding section 102.
- First layer encoding section 102 encodes the signal of sampling rate F1 output from downsampling section 101.
- the code parameter obtained by the first layer code key unit 102 is provided to the multiplexing unit 103 and also to the first layer decoding key unit 104.
- First layer decoding unit 104 generates a first layer decoded signal from the code key parameter output from first layer code unit 102.
- the delay unit 105 gives a delay of a predetermined length to the input signal. This delay is for correcting a time delay generated in the downsampling unit 101, the first layer coding unit 102, and the first layer decoding unit 104.
- Second layer code key unit 106 uses the first layer decoded signal generated in first layer decoding key unit 104 to convert the input signal converted from the delay unit 105 by a predetermined time. ⁇ is performed, and the generated sign key parameter is output to the multiplexing unit 103.
- the multiplexing unit 103 multiplexes the code parameter obtained by the first layer coding unit 102 and the coding parameter obtained by the second layer coding unit 106, and finally multiplexes them. Output as a typical sign parameter.
- FIG. 2 is a block diagram showing the main configuration inside second layer code key section 106.
- the second layer code key unit 106 includes MDCT analysis units 111 and 112, a high-frequency spectrum estimation unit 113, and a corrected scale factor code key unit 114, and each unit performs the following operations.
- MDCT analysis section 111 performs MDCT analysis on the first layer decoded signal to calculate a low band spectrum (narrow band spectrum) of signal band (frequency band) 0 to FL, and sends it to high band spectrum estimation section 1 13 Output.
- the MDCT analysis unit 112 performs MDCT analysis on the voice signal that is the original signal, and calculates a wideband spectrum of the signal band 0 to FH. Of these, the same bandwidth as the narrowband spectrum and the signal band is The high-frequency spectrum of the high-frequency FL to FH is output to the high-frequency spectrum estimation unit 113 and the corrected scale factor code unit 114. Here, there is a relationship FL ⁇ FH between the signal band of the narrowband spectrum and the signal band of the wideband spectrum.
- the high frequency spectrum estimation unit 113 estimates the high frequency spectrum of the signal bands FL to FH using the low frequency spectrum of the signal bands 0 to FL, and obtains an estimated spectrum.
- the method for deriving the estimated spectrum is to obtain an estimated spectrum that maximizes the similarity to the high frequency spectrum by transforming the low frequency spectrum based on the low frequency spectrum.
- the high-frequency spectrum estimation unit 113 encodes information (estimation information) related to the estimated spectrum, outputs the obtained encoding parameter, and supplies the estimated spectrum itself to the corrected scale factor encoding unit 114.
- the estimated spectrum output from the high-frequency spectrum estimation unit 113 is referred to as a first spectrum
- the high-frequency spectrum output from the MDCT analysis unit 112 is referred to as a second spectrum. To do.
- Narrow band spectrum (low band spectrum) ⁇ ⁇ '0 to FL
- the corrected scale factor encoding unit 114 corrects the scale factor of the first spectrum so that the scale factor of the first spectrum approaches the scale factor of the second spectrum, and encodes information on the corrected scale factor. And output.
- FIG. 3 is a block diagram showing the main configuration inside the correction scale factor code key 114. is there.
- the correction scale factor code unit 114 includes scale factor calculation units 121 and 122, a correction scale factor codebook 123, a multiplier 124, a subtractor 125, a determination unit 126, a weighted error calculation unit 127, and A search unit 128 is provided, and each unit performs the following operations.
- the scale factor calculation unit 121 divides the input signal band FL to FH of the second spectrum into a plurality of subbands, obtains the size of the spectrum included in each subband, and outputs it to the subtractor 125. . Specifically, the division into subbands is performed in association with the critical band, and is divided at equal intervals on the Bark scale.
- the scale factor calculation unit 121 calculates the average amplitude of the spectrum included in each subband, and sets this as the second scale factor SF2 (k) ⁇ 0 ⁇ k ⁇ NB ⁇ .
- NB represents the number of subbands.
- the maximum amplitude value may be used instead of the average amplitude.
- the scale factor calculation unit 122 divides the input first spectrum signal band FL to FH into a plurality of subbands, and the first scale factor SF1 (k) ⁇ 0 ⁇ k ⁇ NB ⁇ of each subband. Is output to the multiplier 124. Note that the scale factor calculation unit 122 may use a maximum amplitude value or the like instead of the average amplitude, similarly to the scale factor calculation unit 121.
- each parameter in a plurality of subbands is combined into one vector value.
- NB scale factors are expressed as one vector.
- a case where each process is performed for each vector, that is, a case where vector quantization is performed will be described as an example.
- the correction scale factor codebook 123 stores a plurality of correction scale factor candidates. In accordance with an instruction from the search unit 128, one of the stored correction scale factor candidates is stored in the multiplier 124. Output sequentially. A plurality of correction scale factor candidates stored in the correction scale factor codebook 123 are represented by vectors.
- Multiplier 124 multiplies the first scale factor output from scale factor calculation section 122 and the correction scale factor candidate output from correction scale factor codebook 123, and subtracts the multiplication result from subtractor 125. To give.
- the subtractor 125 uses the second scale factor output from the scale factor calculation unit 121.
- the output of the multiplier 124 that is, the product of the first scale factor and the correction scale factor candidate is subtracted, and the error signal obtained thereby is supplied to the weighted error calculation unit 127 and the determination unit 126.
- the determination unit 126 determines a weight vector to be given to the weighted error calculation unit 127 based on the sign of the error signal given from the subtractor 125. Specifically, the error signal d (k) given from the subtractor 125 is expressed by the following equation (2).
- d (k) SF2 (k)-V, (k)-SFl (k) (0 ⁇ k ⁇ NB)... (2)
- Vi (k) is the i-th correction scale Represents a candidate factor.
- the judging unit 126 checks the sign of d (k), and if w is positive, w is weighted if it is negative.
- the weight vector w (k), which also includes these forces, is output to the weighted error calculation unit 127.
- These weights have the following magnitude relationship (3).
- the weighted error calculation unit 127 first calculates the square value of the error signal given from the subtraction unit 125, and then calculates the weight vector w (k) given from the determination unit 126 to the square of the error signal.
- the weighted square error E is calculated by multiplying the value, and the calculation result is given to the search unit 128.
- the weighted square error E is expressed by the following equation (4).
- Search section 128 controls correction scale factor codebook 123 to sequentially output stored correction scale factor candidates, and weighted 2 output from weighted error calculation section 127 by closed-loop processing. Find a candidate for a correction scale factor that minimizes the multiplication error E.
- the search unit 128 outputs the obtained index iopt of the corrected scale factor candidate as an encoding parameter.
- Figure 4 shows the main components of this scalable decoding device. It is a block diagram which shows a structure.
- Separating section 151 performs separation processing on the input bitstream indicating the encoding parameter.
- a coding parameter for the first layer decoding key unit 152 and a coding parameter for the second layer decoding key unit 153 are generated.
- First layer decoding section 152 decodes the decoded signal of signal band 0 to FL using the code key parameter obtained by separating section 151, and outputs this decoded signal. Also, first layer decoding section 152 gives the obtained decoded signal to second layer decoding section 153.
- the second layer decoding unit 153 is provided with the code key parameter separated by the separating unit 151 and the first layer decoded signal output from the first layer decoding unit 152. Second layer decoding section 153 performs spectrum decoding, converts it to a time domain signal, generates a wideband decoded signal of signal bands 0 to FH, and outputs this.
- FIG. 5 is a block diagram showing the main configuration inside second layer decoding section 153.
- Second layer decoding unit 153 is a component corresponding to second layer code unit 106 in the transform code unit according to the present embodiment.
- MDCT analysis section 161 performs MDCT analysis on the first layer decoded signal, calculates a first spectrum of signal bands 0 to FL, and outputs the first spectrum to highband spectrum decoding section 162.
- the high-frequency spectrum decoding unit 162 uses the encoding parameter (estimation information) and the first spectrum transmitted from the transform encoding apparatus device according to the present embodiment to generate a signal. Decodes the estimated spectrum (fine spectrum) in the band FL to FH. The resulting estimated spectrum is provided to multiplier 164.
- the correction scale factor decoding unit 163 decodes the correction scale factor using the code parameter (correction scale factor) sent from the transform coding apparatus according to the present embodiment. Specifically, referring to a built-in correction scale factor codebook (not shown), the corresponding correction scale factor is output to multiplier 164.
- Multiplier 164 multiplies the estimated spectrum output from high-frequency spectrum decoding unit 162 by the correction scale factor output from correction scale factor decoding unit 163, and supplies the multiplication result to concatenating unit 165. Output.
- the concatenation unit 165 frequency-combines the first spectrum and the estimated spectrum output from the multiplier 164. Connected on several axes, generates a wideband decoded spectrum of signal band 0 to FH, and outputs it to time domain transform section 166.
- Time domain transform section 166 performs inverse MD CT processing on the decoded spectrum output from concatenation section 165, multiplies an appropriate window function, and then corresponds to the signal after windowing of the previous frame. Add the regions to generate and output the second layer decoded signal.
- the scale factor in the coding process in the frequency domain of the higher layer, is quantized by converting the input signal into the frequency domain coefficient.
- the scale factor is quantized using a weighted distortion scale that facilitates selection of a quantization candidate with a small scale factor. That is, it is easy to select a scale factor after quantization that is smaller than the scale factor before quantization. Therefore, even when the number of bits allocated to the quantization of the scale factor is insufficient, it is possible to suppress the deterioration of the subjective quality of hearing.
- the weighting function w expressed by the above equation (1) is always the same when the Bark scale i is the same.
- the weight to be multiplied by the difference is changed according to the difference (E ⁇ C (m)) between the input signal and the quantization candidate.
- E ⁇ C (m) the difference between the input signal and the quantization candidate.
- the weighting candidate C (m) with a positive E-C (m) is more likely to be selected than a quantized candidate C (m) with a negative E-C (m).
- the scale factor after quantization is set smaller than the original scale factor.
- correction scale factor candidates included in the correction scale factor codebook are represented by scalars.
- the basic configuration of a scalable coding apparatus including the transform coding apparatus according to Embodiment 2 of the present invention is the same as that of Embodiment 1. Therefore, the description thereof will be omitted, and second layer coding unit 206 having a configuration different from that of Embodiment 1 will be described below.
- FIG. 6 is a block diagram showing the main configuration inside second layer code key section 206. 2nd
- the code encoder unit 206 has the same basic configuration as that of the second layer code unit 106 shown in the first embodiment. The explanation is omitted.
- components having the same basic operation but different in detail will be described by adding the same reference numerals with alphabetic lower case letters to the same numbers. The same notation method will be used in the description of other configurations.
- the second layer code key unit 206 further includes an auditory masking calculation unit 211 and a bit allocation determining unit 212, and the correction scale factor code key unit 114a is a bit determined by the bit allocation determining unit 212.
- the correction scale factor sign based on the distribution is performed.
- the auditory masking calculation unit 211 analyzes the input signal, calculates an auditory masking value representing an allowable value of quantization distortion, and outputs it to the bit allocation determination unit 212.
- bit allocation determining section 212 determines how many bits are allocated to which subband, and outputs this bit allocation information to the outside. At the same time, it is output to the correction scale factor sign unit 114a.
- the correction scale factor encoding unit 114a quantizes the correction scale factor candidate using the number of bits determined by the bit allocation information output from the bit allocation determination unit 212, and encodes the index. Output as a parameter. At that time, the size of the weight corresponding to the subband is set based on the number of quantization bits of the correction scale factor. Specifically, the correction scale factor sign unit 114a is the difference between two weights with respect to the correction scale factor of the subband having a small number of quantization bits, specifically, when the error signal d (k) is positive. Set so that the difference between weight w and weight w when error signal d (k) is negative is large, and pos neg
- the scalable decoding device according to the present embodiment will be described.
- the scalable decoding device according to the present embodiment is the same as that of the first embodiment.
- the second layer decoding key unit 253 having a basic configuration similar to that of the first decoding decoding device, which is different from the first embodiment, will be described below.
- FIG. 7 is a block diagram showing the main configuration inside second layer decoding section 253.
- Bit allocation decoding section 261 decodes the number of bits of each subband using the code parameter (bit allocation information) that is also sent to the scalable coding apparatus according to the present embodiment. The obtained number of bits is output to the corrected scale factor decoding unit 163a.
- the correction scale factor decoding unit 163a decodes the correction scale factor using the number of bits of each subband and the sign key parameter (correction scale factor), and obtains the obtained correction scale factor. Output to multiplier 164. Other processes are the same as those in the first embodiment.
- the weight is changed according to the number of quantization bits allocated to the scale factor of each band. This weight change is based on the fact that the weight w and error signal d (k) when the error signal d (k) is positive are pos for a scale factor with a small number of quantization bits.
- the basic configuration of the scalable coding apparatus including the conversion coding apparatus according to Embodiment 3 of the present invention is also the same as that of Embodiment 1. Therefore, description thereof is omitted, and second layer code key section 306 having a configuration different from that of Embodiment 1 will be described below.
- FIG. 8 is a block diagram showing a main configuration inside second layer code key section 306.
- the similarity calculation unit 311 calculates the similarity between the second spectrum of the signal bands FL to FH, that is, the vector signal FL to FH, and the similarity obtained. It outputs to the correction scale factor code part 114b.
- the similarity is, for example, It is defined by the SNR (SignaH: o-Noise Ratio) of the estimated spectrum for two spectra.
- the correction scale factor encoding unit 114b quantizes the correction scale factor candidate based on the similarity output from the similarity calculation unit 311, and outputs the index as an encoding parameter. At that time, the weight corresponding to the subband is set based on the similarity of the subband. Specifically, the correction scale factor sign unit 114b calculates the difference between the two weights for the correction scale factor of the subband with low similarity, specifically, the weight when the error signal d (k) is positive. The difference between weights w and pos neg when error signal d (k) is negative is set to be large, while for the above two weights for the correction scale factor of the subbands with high similarity, Set the difference between these two weights to be small
- the weight is changed according to the accuracy of the shape of the estimated vector of each band with respect to the spectrum of the original signal (for example, similarity, SNR, etc.).
- This change in weight is such that the difference between the weight w when the error signal d (k) is positive and the weight w when it is negative is large for the scale factor of the subbands with low similarity.
- the input power of correction scale factor encoding sections 114, 114a, and 114b is shown as an example in the case of two spectra having different characteristics, ie, a first spectrum and a second spectrum.
- the input of the correction scale factor code keys 114, 114a, 114b may be one spectrum. An embodiment in such a case will be described below.
- FIG. 9 is a block diagram showing the main configuration of the transform coding apparatus according to the present embodiment.
- MDCT is used as a conversion method
- the transform code key apparatus includes an MDCT analysis unit 401, a scale factor code key unit 402, a fine spectrum code key unit 403, and a multiplexing unit 404. Perform the operation.
- MDCT analysis section 401 performs MDCT analysis on the original speech signal and outputs the obtained spectrum to scale factor code section 402 and fine spectrum code section 403.
- Scale factor encoding section 402 divides the signal band of the spectrum obtained by MDCT analysis section 401 into a plurality of subbands, calculates the scale factor of each subband, and applies the quantum to these. Do. Details of this quantization will be described later.
- the scale factor encoding unit 402 outputs the code key parameter (scale factor) obtained by the quantization to the multiplexing unit 404 and outputs the decoded scale factor itself to the fine spectrum code key unit 403.
- Fine spectrum code unit 403 normalizes the spectrum given from MDCT analysis unit 401 using the decoding scale factor output from scale factor code unit 402, and converts the normalized spectrum to Encode.
- the fine spectrum encoding unit 403 outputs the obtained encoding parameter (fine spectrum) to the multiplexing unit 404.
- FIG. 10 is a block diagram showing a main configuration inside scale factor code key unit 402.
- the scale factor code unit 402 has the same basic configuration as the scale factor encoding unit 114 shown in the first embodiment, and the same components are denoted by the same reference numerals. The description is omitted.
- multiplier 124 multiplies scale factor SF1 (k) of the first spectrum by correction scale factor candidate V (k), and subtractor 125 obtains error signal d (k).
- the present embodiment is different in that the error signal d (k) is obtained by directly giving the scale factor candidate X (k) to the subtractor 125. That is, in the present embodiment, the expression (2) shown in the first embodiment is expressed as follows.
- FIG. 11 is a block diagram showing the main configuration of the transform decoding apparatus according to the present embodiment.
- Separating section 451 performs a separation process on the input bitstream indicating the encoding parameter, and provides a sign factor parameter (scale factor) for scale factor decoding section 452 and a fine vector decoding section 453. Generate sign key parameters (fine spectrum) for scale factor decoding section 452 and a fine vector decoding section 453.
- the scale factor decoding unit 452 decodes the scale factor using the encoding parameter (scale factor) obtained by the demultiplexing unit 451, and supplies this to the multiplier 454.
- the fine spectrum decoding unit 453 decodes the fine spectrum using the code key parameter (fine vector) obtained by the separation unit 451, and supplies this to the multiplier 454.
- Multiplier 454 multiplies the fine spectrum output from fine spectrum decoding unit 453 by the scale factor output from scale factor decoding unit 452 to generate a decoded spectrum. This decoded spectrum is output to time domain transform section 455.
- Time domain transform section 455 performs time domain transform on the decoded spectrum output from multiplier 454, and outputs the obtained time domain signal as a final decoded signal.
- the present invention can be applied to a code that has a single layer force.
- the scale factor code unit 402 is provided by the MDCT analysis unit 401 according to the bit allocation information shown in the second embodiment and the index such as the similarity shown in the third embodiment.
- the vector scale factor may be attenuated in advance, and quantization may be performed using a normal distortion scale without weighting. As a result, deterioration of voice quality can be reduced even in a low bit rate environment.
- FIG. 12 is a block diagram showing the main configuration of a scalable coding apparatus including the transform code coding apparatus according to Embodiment 5 of the present invention.
- a scalable coding apparatus includes a downsampling unit 501.
- a first layer coding unit 502, a multiplexing unit 503, a first layer decoding unit 504, an upsampling unit 505, a delay unit 507, a second layer coding unit 508, and a background noise analysis unit 506 Consists mainly of.
- Downsampling section 501 generates a signal of sampling rate F 1 (F 1 ⁇ F 2) from the input signal of sampling rate F 2 and provides it to first layer coding section 502.
- First layer encoding section 502 encodes the signal of sampling rate F1 output from downsampling section 501.
- the code key parameter obtained by first layer code key section 502 is provided to multiplexing section 503 and also to first layer decoding key section 504.
- First layer decoding unit 504 generates a first layer decoded signal from the code key parameter output from first layer code unit 502, and outputs the decoded signal to background noise analysis unit 506 and upsampling unit 505. To do.
- Up-sampling section 505 up-samples the sampling rate of the first layer decoded signal from F1 to F2, and outputs this to second layer coding section 508.
- Background noise analysis section 506 receives the first layer decoded signal and determines whether background noise is included in this signal. When the background noise analysis unit 506 determines that background noise is included in the first layer decoded signal! / Sound, it performs processing such as MDCT on the background noise and analyzes its frequency characteristics. The analyzed frequency characteristics are output to the second layer code encoder 508 as background noise information. On the other hand, when the background noise analysis unit 506 determines that the background noise is not included in the first layer decoded signal, the background noise is included in the first layer decoded signal in the second layer encoding unit 508. The background noise information indicating that is output.
- a background noise detection method an input signal in a certain section is analyzed to calculate the maximum power value and the minimum power value of the input signal, and the ratio or difference between them is equal to or greater than a threshold value.
- a general background noise detection method can be employed in addition to a method of setting the minimum power value as noise.
- the delay unit 507 gives a delay having a predetermined length to the input signal. This delay is for correcting a time delay generated in the downsampling unit 501, the first layer coding unit 502, and the first layer decoding unit 504.
- Second layer encoding section 508 receives the up-sampled first layer decoded signal obtained from up-sampling section 505 and background noise information obtained from background noise analysis section 506. Using this, the conversion code of the input signal output from the delay unit 507 is delayed for a predetermined time, and the generated encoding parameter is output to the multiplexing unit 503.
- Multiplexing section 503 multiplexes the code parameter obtained by first layer code key section 502 and the coding parameter obtained by second layer code key section 508, and outputs this to the final result. Output as a typical sign parameter.
- FIG. 13 is a block diagram showing the main configuration inside second layer code key section 508.
- Second layer encoding section 508 includes MDCT analysis sections 511 and 512, high band spectrum estimation section 513, and corrected scale factor encoding section 514, and each section performs the following operations.
- MDCT analysis section 511 performs MDCT analysis on the first layer decoded signal to calculate a low-frequency spectrum (narrowband spectrum) of signal band (frequency band) 0 to FL, and high-frequency spectrum estimation section 5
- the MDCT analysis unit 512 performs MDCT analysis on the voice signal that is the original signal, and calculates a wideband spectrum of the signal band 0 to FH, of which the same bandwidth as the narrowband spectrum and the signal band is
- the high-frequency spectrum of the high-frequency FL to FH is output to the high-frequency spectrum estimation unit 513 and the corrected scale factor code unit 514.
- FL ⁇ FH there is a relationship FL ⁇ FH between the signal band of the narrowband spectrum and the signal band of the wideband spectrum.
- Highband spectrum estimation section 513 estimates the highband spectrum of signal bands FL to FH using the lowband spectrum of signal bands 0 to FL to obtain an estimated spectrum.
- the method for deriving the estimated spectrum is to obtain an estimated spectrum that maximizes the similarity to the high frequency spectrum by transforming the low frequency spectrum based on the low frequency spectrum.
- the high-frequency spectrum estimation unit 513 encodes information (estimation information) related to the estimated spectrum and outputs the obtained encoding parameters.
- the estimated spectrum output from the high-frequency spectrum estimation unit 513 is referred to as the first spectrum
- the high-frequency spectrum output from the MDCT analysis unit 512 is referred to as the second spectrum. To do.
- Narrow band spectrum (low band spectrum) ⁇ ⁇ ⁇ 0 to FL Broadband spectrum ... 0 ⁇ FH
- the corrected scale factor code unit 514 encodes and outputs information on the scale factor of the second spectrum using the background noise information.
- FIG. 14 is a block diagram showing the main configuration inside the correction scale factor code key section 514.
- the corrected scale factor code unit 514 includes a scale factor calculation unit 521, a correction scale factor codebook 522, a subtractor 523, a determination unit 524, a weighted error calculation unit 525, and a search unit 526. Perform the operation.
- Scale factor calculation section 521 divides input signal band FL to FH of the second spectrum into a plurality of subbands, obtains the size of the spectrum included in each subband, and outputs it to subtractor 523. . Specifically, the division into subbands is performed in association with the critical band, and is divided at equal intervals on the Bark scale.
- the scale factor calculation unit 521 calculates the average amplitude of the spectrum included in each subband, and sets this as the second scale factor SF2 (k) ⁇ 0 ⁇ k ⁇ NB ⁇ .
- NB represents the number of subbands.
- the maximum amplitude value may be used instead of the average amplitude.
- each parameter in a plurality of subbands is combined into one vector value.
- NB scale factors are expressed as one vector.
- a case where each process is performed for each vector, that is, a case where vector quantization is performed will be described as an example.
- the correction scale factor codebook 522 stores a plurality of correction scale factor candidates. In accordance with an instruction from the search unit 526, one of the stored correction scale factor candidates is stored in the subtracter 523. Output sequentially. A plurality of correction scale factor candidates stored in the correction scale factor codebook 522 are represented by vectors.
- the subtractor 523 subtracts the correction scale factor candidate, which is the output of the correction scale factor, from the second scale factor output from the scale factor calculation unit 521, and obtains an error signal obtained thereby as a weighted error calculation unit. 525 and determination unit 524.
- the determination unit 524 is based on the sign of the error signal given from the subtracter and the background noise information.
- the weight vector to be given to the weighted error calculation unit 525 is determined.
- a specific processing flow in the determination unit 524 will be described.
- the determination unit 524 analyzes the input background noise information. Further, the determination unit 524 has a background noise flag BNF (k) ⁇ 0 ⁇ k ⁇ NB ⁇ , in which the number of elements is the number of subbands NB. In the case where the background noise information indicates that background noise is included in the input signal (first decoded signal), the determination unit 524 indicates that the background noise flag BNF (k) Set all to 0. In addition, the determination unit 524 indicates that the background noise information includes the background noise in the input signal (first decoded signal), and in this case, the frequency characteristics of the background noise indicated by the background noise information. Is converted into a frequency characteristic for each subband. For simplicity, the background noise information is treated here as indicating the average power value of the spectrum for each subband.
- the determination unit 524 compares the average power value SP (k) of the spectrum for each subband with the threshold ST (k) for each subband set in advance, and SP (k) is equal to or greater than ST (k). If it is, set the value of the background noise flag BNF (k) of the corresponding subband to 1.
- V (k) represents the i-th correction scale factor candidate.
- the determination unit 524 selects w as a weight when the sign of d (k) is positive. In addition, the determination unit 524 determines d (k)
- wpos is selected as the weight.
- the determination unit 524 selects w as a weight when the sign of d (k) is negative and the value of the background noise flag BNF (k) is 0. Next, the determination unit 524
- the weight vector w (k) also comprising these forces is output to the weighted error calculation unit 525.
- These weights have the following magnitude relationship (7).
- the weighted error calculation unit 525 first calculates the square value of the error signal given from the subtractor 523, and then uses the weight vector w (k) given from the determination unit 524 as the square of the error signal.
- the weighted square error E is calculated by multiplying the value, and the calculation result is given to the search unit 526.
- the weighted square error E is expressed by the following equation (8).
- Search section 526 controls correction scale factor codebook 522 to sequentially output stored correction scale factor candidates and performs weighted 2 output from weighted error calculation section 525 by closed-loop processing. Find a candidate for a correction scale factor that minimizes the multiplication error E. Search section 526 outputs the obtained index iopt of the corrected scale factor as an encoding parameter.
- a better decoded signal can be obtained audibly by adjusting the degree of the above action according to whether or not background noise is included in the (first layer decoded signal). This trend was confirmed by computer simulation.
- the decoding apparatus of the present embodiment is different from that of Embodiment 1 only in the internal configuration of second layer decoding section 153.
- the main configuration of second layer decoding section 153 according to the present embodiment will be described below using FIG.
- Second layer decoding key section 153 is a component corresponding to second layer code key section 508 in the transform code key apparatus according to the present embodiment.
- MDCT analysis section 561 performs MDCT analysis on the first layer decoded signal, calculates the first spectrum of signal bands 0 to FL, and outputs the first spectrum to highband spectrum decoding section 562.
- the high-frequency spectrum decoding unit 562 uses the encoding parameters (estimation information) and the first spectrum transmitted from the transform encoder apparatus according to the present embodiment to generate a signal. Decodes the estimated spectrum (fine spectrum) in the band FL to FH. The obtained estimated spectrum is given to the high-frequency spectral normalization unit 563.
- Correction scale factor decoding unit 564 decodes the correction scale factor using the code parameter (correction scale factor) sent from the transform coding apparatus according to the present embodiment. Specifically, refer to the built-in correction scale factor codebook 522 (not shown). The corresponding correction scale factor is output to the multiplier 565.
- the high-frequency spectrum normal part 563 divides the signal band FL to FH of the estimated spectrum output from the high-frequency spectrum decoding part 562 into a plurality of subbands, and the spectrum included in each subband. Find the size of. Specifically, the division into subbands is performed in correspondence with the critical band, and is divided at equal intervals by the Bark scale. Also, the scale factor calculation unit 521 obtains the average amplitude of the spectrum included in each subband, and sets this as the first scale factor SFl (k) ⁇ 0 ⁇ k ⁇ NB ⁇ .
- NB represents the number of subbands. A maximum amplitude value or the like may be used instead of the average amplitude.
- the high-frequency spectrum normal section 563 divides the estimated spectrum value (MDCT value) by the first scale factor SFl (k) for each subband, and divides the estimated spectral value by the normal value. ⁇ Output to multiplier 565 as estimated spectrum.
- Multiplier 565 multiplies the normality estimation spectrum output from high-frequency spectrum normalization section 563 by the correction scale factor output from correction scale factor decoding section 564, and combines the multiplication results. Output to 566.
- Concatenating unit 566 concatenates the first spectrum and the normality estimation spectrum output from the multiplier on the frequency axis to generate a wideband decoded spectrum of signal bands 0 to FH. To the time domain conversion unit 567.
- Time domain transform section 567 performs inverse MD CT processing on the decoded spectrum output from concatenation section 566, multiplies an appropriate window function, and corresponds to the signal after windowing of the previous frame. Add the regions to generate and output the second layer decoded signal.
- the scale factor is quantized using a weighted distortion scale that facilitates selection of a quantization candidate with a small scale factor. That is, it is easy to select a scale factor after quantization that is smaller than the scale factor before quantization. Therefore, even when the number of bits allocated to the quantization of the scale factor is insufficient, it is possible to suppress the deterioration of the subjective quality of hearing.
- the case where vector quantization is used has been described as an example. Instead of vector quantization, that is, processing for each vector, processing may be performed independently for each subband.
- the correction scale factor candidates included in the correction scale factor codebook 522 are represented by scalars.
- the value of the background noise flag BNF (k) is determined by comparing the average power value for each subband with the threshold value.
- the present invention is not limited to this, and background noise is not limited thereto. The same applies to a method that uses the ratio of the average power value for each subband to the average power value for each subband of the first decoded signal (speech unit).
- the configuration in which the upsampling unit 505 is provided in the encoder apparatus has been described.
- the present invention is not limited to this, and the first upsampling unit is not provided. The same applies to the case where the 1-layer decoded key signal is input to the second layer code key section.
- the present invention is not limited to this, and can also be applied to the case of switching whether to use the above-described method according to the characteristics of the input signal (whether voiced or unvoiced).
- external quantization is performed by distance calculation using the above-mentioned weight for the part where the input signal contains speech, and the above-mentioned weight for the part where the input signal does not contain voice.
- Embodiment 6 of the present invention differs from Embodiment 5 only in the internal configuration of the second layer encoding section of the encoding apparatus.
- FIG. 16 is a block diagram showing the main configuration inside second layer encoding section 508 according to the present embodiment.
- the second layer code key section 508 shown in FIG. 16 is different from the action force correction scale factor code key section 514 of the correction scale factor code key section 614 in comparison with FIG.
- the high frequency spectrum estimation unit 513 converts the estimated spectrum itself into a corrected scale factor code ⁇ . Part 614.
- the corrected scale factor code unit 614 corrects the scale factor of the first spectrum using the background noise information so that the scale factor of the first spectrum approaches the scale factor of the second spectrum,
- the information about the corrected scale factor is encoded and output.
- FIG. 17 is a block diagram showing the main configuration inside correction scale factor sign key section 614 in FIG.
- the corrected scale factor code unit 614 includes the scale factor calculation units 621 and 622, the correction scale factor code book 623, the multiplier 624, the subtractor 625, the determination unit 626, the weighted error calculation unit 627, and the search unit 628. Each part performs the following operations.
- Scale factor calculation section 621 divides signal band FL to FH of the input second spectrum into a plurality of subbands, obtains the size of the spectrum included in each subband, and outputs the result to subtractor 625. . Specifically, the division into subbands is performed in association with the critical band, and is divided at equal intervals on the Bark scale.
- the scale factor calculation unit 621 calculates the average amplitude of the spectrum included in each subband, and sets this as the second scale factor SF2 (k) ⁇ 0 ⁇ k ⁇ NB ⁇ .
- NB represents the number of subbands.
- the maximum amplitude value may be used instead of the average amplitude.
- each parameter in a plurality of subbands is combined into one vector value.
- NB scale factors are expressed as one vector.
- a case where each process is performed for each vector, that is, a case where vector quantization is performed will be described as an example.
- the scale factor calculation unit 622 divides the input signal band FL to FH of the first spectrum into a plurality of subbands, and the first scale factor SF1 (k) ⁇ 0 ⁇ k ⁇ NB ⁇ of each subband Is output to the multiplier 624.
- the maximum amplitude value may be used instead of the average amplitude.
- Correction scale factor codebook 623 stores a plurality of correction scale factor candidates, and in accordance with an instruction from search section 628, one of the stored correction scale factor candidates is stored in multiplier 624. Output sequentially. A plurality of correction scale factor candidates stored in the correction scale factor codebook 623 are represented by vectors. Multiplier 624 multiplies the first scale factor output from scale factor calculation section 622 and the correction scale factor candidate output from correction scale factor codebook 623, and subtracts the multiplication result by subtractor 625. To give.
- the subtracter 625 subtracts the output of the multiplier 624, that is, the product of the first scale factor and the correction scale factor candidate, from the second scale factor output from the scale factor calculation unit 621, and is thus obtained.
- the error signal is supplied to the determination unit 626 and the weighted error calculation ⁇ 627.
- Determination unit 626 determines a weight vector to be given to the weighted error calculation unit based on the sign of the error signal given from subtractor 625 and the background noise information.
- a specific processing flow in the determination unit will be described.
- the determination unit 626 analyzes the input background noise information. Further, the determination unit 626 includes a background noise flag BNF (k) ⁇ 0 ⁇ k ⁇ NB ⁇ in which the number of elements is the number of subbands NB. In the case where the background noise information indicates that background noise is included in the input signal (first decoded signal), the determination unit 626 indicates that the background noise flag BNF (k) is set. Set all to 0. In addition, the determination unit 626 indicates that the background noise information includes the background noise in the input signal (first decoded signal), and in this case, the frequency characteristics of the background noise indicated by the background noise information. Is converted into a frequency characteristic for each subband.
- the background noise information is treated here as indicating the average power value of the spectrum for each subband.
- the determination unit 626 compares the average power value SP (k) of the spectrum for each subband with a threshold ST (k) for each subband set in advance, and SP (k) is equal to or greater than ST (k). If it is, set the value of the background noise flag BNF (k) of the corresponding subband to 1.
- v ⁇ k represents a candidate for the i-th correction scale factor.
- the judgment unit 626 uses d ( If the sign of k) is positive, select w as the weight. In addition, the determination unit 626 uses d (k) pos
- w is selected as the weight pos. Also, the determination unit 626 selects w as a weight when the sign of d (k) is negative and the value of the background noise flag BNF (k) is 0. Next, the determination unit 626 neg
- weight vector w (k) that also constitutes these forces is output to the weighted error calculation unit 627.
- These weights have the following magnitude relationship (10).
- the number of subbands NB 4, d (k) code +,-,-, + ⁇ , and the background noise flag BNF (k) is ⁇ 0, 0, 1, 1 ⁇
- the weighted error calculation unit 627 first calculates the square value of the error signal given from the subtractor 625, and then uses the weight vector w (k) given from the judgment unit 626 as the square of the error signal.
- the weighted square error E is calculated by multiplying the value, and the calculation result is given to the search unit 628.
- the weighted square error E is expressed by the following equation (11).
- Search section 628 controls correction scale factor codebook 623 to sequentially output stored correction scale factor candidates and performs weighted 2 output from weighted error calculation section 627 by closed-loop processing. Find a candidate for a correction scale factor that minimizes the multiplication error E. Search section 628 outputs the obtained index iopt of the corrected scale factor candidate as an encoding parameter.
- the first scale factor is normalized and the normalized value multiplied by the correction scale factor candidate is smaller than the target second scale factor.
- the case where the error signal d (k) is negative is a case where the decoded value generated on the decoding side is larger than the second scale factor which is the target value. Therefore, by setting the weight when the error signal d (k) is positive to be smaller than the weight when the error signal d (k) is negative, A correction scale factor candidate that generates a decoded value smaller than the second scale factor is easily selected.
- the scale factor decoded value becomes smaller than the target value and the scale factor after quantization acts in a direction that attenuates this estimated spectrum
- the low accuracy of the estimated spectrum becomes inconspicuous.
- the effect is that the sound quality of the signal is improved.
- a better decoded signal can be obtained audibly by adjusting the degree of the above action according to whether or not background noise is included in the input signal (first layer decoded signal). This trend was confirmed by computer simulation.
- the present invention is not limited to this, and is similarly applied to the case of switching whether to use the method described above according to the characteristics of the input signal (whether voiced or unvoiced).
- external quantization is performed by distance calculation using the above-mentioned weight for the part where the input signal contains speech, and the above-mentioned weight for the part where the input signal does not contain voice.
- the realization of vector quantization by distance calculation using is a method of performing vector quantization by the method shown in the first to fourth embodiments. In this way, by switching the vector quantization distance calculation method on the time axis according to the characteristics of the input signal, a higher-quality decoded signal can be obtained.
- FIG. 18 is a block diagram showing the main configuration of the scalable decoding device according to Embodiment 7 of the present invention.
- a separation unit 701 receives a bitstream sent from an encoding device (not shown), separates the bitstream based on the layer information recorded in the received bitstream, and obtains layer information. Output to the switching unit 705 and the post-filter correction LPC calculation unit 708.
- the separation unit 701 performs the bit stream.
- One muka also separates the first layer code key information, the second layer code key information, and the third layer code key information.
- the separated first layer code information is sent to the first layer decoding key unit 702
- the second layer code key information is sent to the second layer decoding key unit 703
- the third layer code key information is sent to the third layer.
- the data is output to the decryption unit 704, respectively.
- the separation unit 701 extracts the first layer code from the bitstream. Coding information and second layer coding information are separated. The separated first layer code key information is output to the first layer decoding key unit 702, and the second layer code key information is output to the second layer decoding key unit 703.
- the separation unit 701 uses the bitstream power first layer code information. The information is separated, and the separated first layer code key information is output to first layer decoding key section 702.
- First layer decoding key section 702 uses the first layer code key information output from demultiplexing section 701 to generate a first layer decoded signal of basic quality when signal band k is 0 or more and less than FH. Then, the generated first layer decoded signal is output to switching section 705, second layer decoding section 703, and background noise detection section 706. [0161] When the second layer code key information is output from separating section 701, second layer decoding key section 703 is output from the second layer code key information and first layer decoding key section 702.
- the first layer decoded signal is used to generate a second layer decoded signal having an improved quality when the signal band k is 0 or more and less than FL and a basic quality when the signal band k is more than FL and less than FH.
- the generated second layer decoded signal is output to switching section 705 and third layer decoding section 704. Note that the second layer decoding key unit 703 does not operate at all because the second layer code key information is not obtained when the layer information indicates layer 1, or the second layer decoding key unit 703 Update the provided variable.
- third layer decoding key section 704 When third layer code key information is output from demultiplexing section 701, third layer decoding key section 704 outputs the third layer code key information and second layer decoding key section 703.
- the second layer decoded signal is used to generate a third layer decoded signal of improved quality when the signal band k is 0 or more and less than FH.
- the generated third layer decoded signal is output to switching section 705.
- the third layer decoding unit 704 does not operate at all because the third layer code key information cannot be obtained when the layer information indicates layer 1 or layer 2, or the third layer decoding key unit 704 does not operate. Update the variable in Isobe 07 04.
- Background noise detection section 706 receives the first layer decoded signal and determines whether or not background noise is included in this signal. If the background noise detection unit 706 determines that the background noise is included in the first layer decoded signal! / Sound, the background noise detection unit 706 performs processing such as MDCT on the background noise and analyzes the frequency characteristics thereof. The analyzed frequency characteristic is output to the modified LPC calculation unit 708 as background noise information. If the background noise detection unit 706 determines that the background noise is not included in the first layer decoded signal, the background noise information indicating that the background noise is not included in the first layer decoded signal. Output to the modified LPC calculator 708.
- a background noise detection method an input signal in a certain section is analyzed to calculate a maximum power value and a minimum power value of the input signal, and a ratio or difference between them is greater than or equal to a threshold value.
- a general background noise detection method can be employed.
- background noise detection section 706 determines whether or not background noise is included in the first layer decoded signal.
- the present invention is not limited to this, and the second layer decoded signal and the third layer decoded signal are not limited to this. Check whether background noise is included in the layer decoded signal It can also be applied to the case where the background noise information included in the input signal is transmitted, and the transmitted background noise information is used.
- Switching section 705 determines which layer of the decoded signal is obtained based on the layer information output from demultiplexing section 701, and corrects the decoded signal in the highest layer as modified LPC calculating section 708 and Output to the filter unit 707.
- the post filter includes a modified LPC calculation unit 708 and a filter unit 707, and a modified LPC calculation unit.
- Modified LPC calculator 708 calculates a modified LPC coefficient using the layer information output from the separation unit 701, the decoded signal output from the switching unit 705, and the background noise information obtained from the background noise detection unit 706, The calculated modified LPC coefficient is output to the filter unit 707.
- Modified LPC calculator 708 calculates a modified LPC coefficient using the layer information output from the separation unit 701, the decoded signal output from the switching unit 705, and the background noise information obtained from the background noise detection unit 706, The calculated modified LPC coefficient is output to the filter unit 707.
- the filter unit 707 configures a filter by the modified LPC coefficient output from the modified LPC calculation unit 708, performs post-filter processing on the decoded signal output from the switching unit 705, and performs boost filter processing. Output the decoded signal.
- FIG. 19 is a block diagram showing an internal configuration of modified LPC calculation section 708 shown in FIG.
- a frequency conversion unit 711 performs frequency analysis of the decoded signal output from the switching unit 705, obtains a spectrum of the decoded signal (hereinafter referred to as “decoded spectrum”), and calculates the obtained decoded spectrum as a power spectrum. Output to part 712.
- Power spectrum calculation section 712 calculates the power of the decoded spectrum output from frequency conversion section 711 (hereinafter referred to as “power spectrum”), and outputs the calculated power spectrum to power vector correction section 713
- corrected band determining section 714 determines a band for correcting the spectrum (hereinafter referred to as "corrected band"), and the determined band is the corrected band.
- Information is output to the power spectrum correction unit 713 as information.
- the corrected band determining unit 714 sets the corrected band to 0 (corrected If the layer information indicates layer 2, the corrected bandwidth is set to 0 to FL. If the layer information indicates layer 3, the corrected bandwidth is set to 0 to FH. [0171]
- the power spectrum correction unit 713 corrects and corrects the power vector output from the power spectrum calculation unit 712 based on the correction band information output from the correction band determination unit 714 and the background noise information. The power spectrum is output to the inverse conversion unit 715.
- the correction of the power spectrum means that when the background noise information indicates that "the first decoded signal does not contain background noise", the post-filter characteristic is weakened and the spectrum is not deformed. It means to make it smaller, and more specifically, it means to make corrections to suppress changes on the frequency axis of the power spectrum.
- the layer information indicates layer 2
- the characteristics of the post filter in the band 0 to FL are weakened.
- the layer information indicates layer 3, the characteristics of the post filter in the band 0 to FH are weakened.
- the power spectrum correction unit 713 indicates that the background noise information indicates that “the first decoded signal includes background noise”. The process of not performing or reducing the degree of weakening is performed.
- Inverse transform section 715 performs inverse transform on the modified power spectrum output from power spectrum modification section 713 to obtain an autocorrelation function.
- the obtained autocorrelation function is output to the LPC analysis unit 716.
- the inverse transform unit 715 can reduce the amount of calculation by using FFT (Fast Fourier Transform).
- FFT Fast Fourier Transform
- the LPC analysis unit 716 obtains an LPC coefficient using an autocorrelation method or the like for the autocorrelation function output from the inverse transform unit 715, and outputs the obtained LPC coefficient to the filter unit 707 as a modified LPC coefficient.
- FIG. 21 shows how the power spectrum is corrected by the first realization method.
- This figure shows how the power spectrum of the female voiced part (n /) is corrected when the layer information is layer 2 (weakening the post-filter characteristics in the 0 to FL band).
- the band of ⁇ FL is replaced with a power spectrum of about 22 dB.
- it is desirable to correct the spectrum so that the change in the spectrum at the connection portion of the band is not discontinuous with the band to be corrected.
- a moving average value is obtained for the power spectrum in the connection portion and its vicinity, and the corresponding power spectrum is replaced with the moving average value. This makes it possible to obtain a modified LPC coefficient with accurate spectral characteristics.
- the second method is to obtain the spectral slope of the power spectrum in the corrected band and replace the spectrum in that band with the obtained spectral slope.
- the spectrum inclination indicates the overall inclination of the power spectrum in the band.
- the first-order PARCOR coefficient (reflection coefficient) of the decoded signal or the spectral characteristics of a digital filter formed by multiplying the PARCOR coefficient by a constant is used.
- the power spectrum of the band is replaced by multiplying this vector characteristic by a coefficient calculated so that the energy of the power spectrum in the band is preserved.
- FIG. 22 shows how the power spectrum is corrected by the second realization method.
- the power spectrum in the 0 to FL band is replaced with a power spectrum that slopes to about 23 dB to 26 dB.
- Equation (12) a (i) is the LPC (Linear Prediction Coefficient) coefficient of the decoded signal, NP is the order of the LPC coefficient, and ⁇ n and ⁇ d are set values that determine the degree of noise suppression of the post filter (0 ⁇ n ⁇ d ⁇ 1), represents the set value for correcting the spectral tilt caused by the formant emphasis filter.
- a power spectrum obtained by multiplying the power spectrum in the correction band by the ⁇ power (0 to ⁇ 1) may be used.
- the characteristics of the post filter can be designed more flexibly than the method of flattening the power spectrum as described above.
- the Ktonole in the f column The order of the LPC coefficient is 18th.
- the solid line shown in Fig. 23 represents the spectrum characteristics when the power spectrum is corrected, and the dotted line represents the spectrum characteristics when the power spectrum is not corrected (the set values are the same as above).
- the characteristics of the post filter when the power spectrum is corrected are almost flat in the 0 to FL band, and the power spectrum is not corrected in the FL to FH band.
- the spectral characteristics are the same as in the case.
- the power spectrum of the band corresponding to the layer information is corrected, the corrected LPC coefficient is calculated based on the corrected power spectrum, and the post-processing is performed using the calculated corrected LPC coefficient.
- the corrected LPC coefficient is calculated when the layer information is any of layers 1 to 3, but all the bands to be encoded are almost the same.
- layer 1 is the basic quality of the entire band and layer 3 of the improved quality of the entire band
- setting values ( ⁇ ⁇ , yd, and) that define the strength of the post filter may be prepared for each layer in advance, and the post filter may be configured directly by switching the prepared setting values. .
- the processing amount and processing time required to calculate the modified LPC coefficient can be reduced.
- power spectrum correction section 713 performs processing common to all bands depending on whether or not background noise is present in the first layer decoded signal.
- the background noise detection unit 706 calculates the frequency characteristics of the background noise included in the first layer decoded signal, and the power spectrum correction unit 713 uses the result to correct the power spectrum for each subband. The same can be applied to the case of switching.
- FIG. 24 is a block diagram showing the main configuration of the scalable decoding device according to Embodiment 8 of the present invention.
- the second switching unit 806 acquires layer information from the separation unit 801, determines which layer's decoded spectrum can be obtained based on the acquired layer information, and determines the highest layer.
- the decoded LPC coefficients are output to the post-filter suppression information calculation section 808.
- a decoded LPC coefficient is not generated during the decoding process, and in such a case, one of the decoded LPC coefficients obtained by the second switching unit 806 is selected.
- Background noise detection section 807 receives the first layer decoded signal, and background noise is included in this signal. Determine whether it is included. When the background noise detection unit 807 determines that background noise is included in the first layer decoded signal! / Sound, the background noise detection unit 807 performs processing such as MDCT on the background noise and analyzes the frequency characteristics thereof. The analyzed frequency characteristics are output to the suppression information calculation unit 808 as background noise information. Further, when the background noise detection unit 807 determines that the background noise is not included in the first layer decoded signal, information indicating that the background noise is not included in the first layer decoded signal is used as background noise information. Output to suppression information calculation section 808.
- the background noise detection method analyzes the input signal in a certain interval, calculates the maximum power value and minimum power value of the input signal, and the minimum power when the ratio or difference between them is greater than or equal to the threshold value.
- a general background noise detection method will be adopted in addition to the method of setting the value as noise.
- background noise detection section 706 determines whether background noise is included in the first layer decoded signal, but the present invention is not limited to this, and the second layer decoded signal and When detecting whether background noise is included in the third layer decoded signal, or when transmitting background noise information contained in the input signal from the encoder and using the transmitted background noise information The same applies to.
- the suppression information calculation unit 808 uses the layer information output from the separation unit 801, the LPC coefficient output from the second switching unit 806, and the background noise information output from the background noise detection unit 807. The suppression information is calculated, and the calculated suppression information is output to the multiplier 809. Details of the suppression information calculation unit 808 will be described later.
- Multiplier 809 multiplies the decoding spectrum output from switching section 805 by the suppression information output from suppression information calculation section 808, and a time domain conversion section using the decoding spectrum multiplied by the suppression information. Output to 810.
- Time domain transform section 810 performs inverse MD CT processing on the decoded spectrum output from multiplier 809, multiplies an appropriate window function, and corresponds to the signal after windowing of the previous frame. The areas are added to generate and output an output signal.
- FIG. 25 is a block diagram showing an internal configuration of suppression information calculation section 808 shown in FIG.
- an LPC spectrum calculation unit 821 performs discrete Fourier transform on the decoded LPC coefficient output from the second switching unit 806, calculates the energy of each complex spectrum, and uses the calculated energy as the LPC spectrum. Output to. Ie
- a filter represented by the following equation (13) is configured.
- the LPC spectrum calculation unit 821 calculates the spectral characteristic of the filter represented by the above equation (13), and outputs it to the LPC spectrum correction unit 822.
- NP represents the order of the decoded LPC coefficient.
- a filter represented by the following equation (14) is configured using predetermined parameters ⁇ ⁇ and ⁇ d for adjusting the degree of noise suppression, and the spectral characteristics of the filter are calculated. Even so, ⁇ (0 ⁇ ⁇ ⁇ d ⁇ 1).
- the filter represented by the formula (13) or the formula (14) has a characteristic in which the low frequency band (or high frequency band) is excessively emphasized compared to the high frequency band (or low frequency band). (In general, this characteristic has a “spectral slope” t), but a filter (anti-tilt filter) for correcting this may be used in combination.
- the LPC spectrum correction unit 822 is similar to the power spectrum correction unit 713 in the seventh embodiment, based on the correction band information output from the correction band determination unit 823.
- the LPC spectrum output from is corrected, and the corrected LPC spectrum is output to the suppression coefficient calculation unit 824.
- the suppression coefficient calculation unit 824 calculates the suppression coefficient by the following method using the background noise information.
- Suppression coefficient calculation section 824 divides the modified LPC spectrum output from LPC spectrum modification section 822 into subbands having a predetermined bandwidth, and obtains an average value for each divided subband. Then, a subband whose average value is smaller than a predetermined threshold is selected, and a coefficient (vector value) for suppressing the decoded spectrum is calculated based on the selected subband. Thereby, it is possible to attenuate the subband including the band that becomes the valley of the spectrum. Incidentally, the suppression coefficient is calculated based on the average value of the selected subbands.
- the suppression coefficient is calculated by multiplying the average value of the subbands by a predetermined coefficient. For subbands whose average value is equal to or greater than a predetermined threshold, a coefficient that does not change the decoded spectrum is calculated.
- the suppression coefficient need not be an LPC coefficient, but may be a coefficient that is directly multiplied by the decoded spectrum. As a result, it is not necessary to perform the inverse transformation process and the LPC analysis process, and the amount of calculation required for these processes can be reduced.
- the suppression coefficient calculation unit 824 may calculate the suppression coefficient based on the following method. That is, suppression coefficient calculation section 824 divides the modified LPC spectrum output from LPC spectrum modification section 822 into subbands having a predetermined bandwidth, and obtains an average value for each divided subband. Then, the maximum subband is obtained from the average values of the subbands, and the average value of the subbands is normalized using the average value of the subbands. The subband average value after the normality is output as a suppression coefficient.
- a suppression coefficient is calculated for each frequency. It may be output.
- the suppression coefficient calculation unit 824 obtains the maximum frequency among the modified LPC spectra output from the LPC spectrum modification unit 822, and normalizes the spectrum of each frequency using the spectrum of the frequency. The normalized spectrum is output as a suppression coefficient.
- the suppression coefficient calculated as described above is that the background noise information input to the suppression coefficient calculation unit 824 is "background noise is included in the first layer decoded signal" If so, the final decision will be made according to the background noise level so that the effect of attenuating the subband including the band that becomes the valley of the spectrum is reduced.
- the decoding is performed. Processes that make the sense of noise in the signal as inconspicuous as possible and increase the sense of bandwidth of the decoded signal as much as possible in the presence of background noise can be generated, producing a more subjectively good quality decoded signal. I can do it.
- the calculated LPC spectrum of the decoded LPC coefficient power is a spectrum envelope from which fine information of the decoded signal is removed, and is directly suppressed based on this spectrum envelope.
- the suppression coefficient is switched depending on whether or not background noise is included in the input signal (in the first layer decoded signal). In contrast, it is possible to generate decoded signals of subjectively good quality.
- Embodiments 1 to 3 and 5 to 8 the power described by taking the case where the number of hierarchies is 2 or 3 as an example.
- the present invention is scalable to any number of hierarchies as long as the number of hierarchies is two or more. It can be applied to sign ⁇ .
- Embodiments 1 to 3 and 5 to 8 can also be applied to other hierarchical encoding such as a force-encoded code that is described taking a scalable code as an example. .
- an audio signal is an encoding target
- the present invention is not limited to this, and the present invention can also be applied to, for example, an audio signal.
- FFT force Fast Fourier Transform
- DFT Discrete Fourier Transform
- DCT subband filter
- MDCT subband filter
- transform coding apparatus and transform coding method according to the present invention are not limited to the above embodiments, and can be implemented with various modifications.
- the conversion coding apparatus can be mounted on a communication terminal apparatus and a base station apparatus in a mobile communication system, and thus has the same effects as described above.
- a communication terminal device, a base station device, and a mobile communication system can be provided.
- the present invention can be implemented with software.
- the algorithm of the transform code encoding method according to the present invention is described in a programming language, the program is stored in a memory, and is executed by an information processing means. Similar functions can be realized.
- each functional block used in the description of each of the above embodiments is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include some or all of them.
- the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. It is also possible to use a field programmable gate array (FPGA) that can be programmed after LSI manufacturing, or a reconfigurable processor that can reconfigure the connection or setting of circuit cells inside the LSI.
- FPGA field programmable gate array
- the transform coding apparatus and transform coding method according to the present invention can be applied to applications such as a communication terminal apparatus and a base station apparatus in a mobile communication system.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/089,985 US8135588B2 (en) | 2005-10-14 | 2006-10-13 | Transform coder and transform coding method |
EP06821860A EP1953737B1 (en) | 2005-10-14 | 2006-10-13 | Transform coder and transform coding method |
BRPI0617447-7A BRPI0617447A2 (pt) | 2005-10-14 | 2006-10-13 | codificador de transformada e método de codificação de transformada |
CN2006800375449A CN101283407B (zh) | 2005-10-14 | 2006-10-13 | 变换编码装置和变换编码方法 |
JP2007540000A JP4954080B2 (ja) | 2005-10-14 | 2006-10-13 | 変換符号化装置および変換符号化方法 |
US13/367,840 US8311818B2 (en) | 2005-10-14 | 2012-02-07 | Transform coder and transform coding method |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005-300778 | 2005-10-14 | ||
JP2005300778 | 2005-10-14 | ||
JP2006272251 | 2006-10-03 | ||
JP2006-272251 | 2006-10-03 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/089,985 A-371-Of-International US8135588B2 (en) | 2005-10-14 | 2006-10-13 | Transform coder and transform coding method |
US13/367,840 Continuation US8311818B2 (en) | 2005-10-14 | 2012-02-07 | Transform coder and transform coding method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007043648A1 true WO2007043648A1 (ja) | 2007-04-19 |
Family
ID=37942869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/320457 WO2007043648A1 (ja) | 2005-10-14 | 2006-10-13 | 変換符号化装置および変換符号化方法 |
Country Status (8)
Country | Link |
---|---|
US (2) | US8135588B2 (ja) |
EP (1) | EP1953737B1 (ja) |
JP (1) | JP4954080B2 (ja) |
KR (1) | KR20080047443A (ja) |
CN (2) | CN101283407B (ja) |
BR (1) | BRPI0617447A2 (ja) |
RU (1) | RU2008114382A (ja) |
WO (1) | WO2007043648A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011058752A1 (ja) * | 2009-11-12 | 2011-05-19 | パナソニック株式会社 | 符号化装置、復号装置およびこれらの方法 |
WO2012005212A1 (ja) * | 2010-07-05 | 2012-01-12 | 日本電信電話株式会社 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、及び記録媒体 |
WO2012032759A1 (ja) * | 2010-09-10 | 2012-03-15 | パナソニック株式会社 | 符号化装置及び符号化方法 |
WO2013051210A1 (ja) * | 2011-10-07 | 2013-04-11 | パナソニック株式会社 | 符号化装置及び符号化方法 |
JP2019152878A (ja) * | 2011-11-03 | 2019-09-12 | ヴォイスエイジ・コーポレーション | 時間領域デコーダによって復号化された時間領域励振の一般のオーディオ合成物を修正するための方法および装置 |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8660851B2 (en) | 2009-05-26 | 2014-02-25 | Panasonic Corporation | Stereo signal decoding device and stereo signal decoding method |
CN102804263A (zh) * | 2009-06-23 | 2012-11-28 | 日本电信电话株式会社 | 编码方法、解码方法、利用了这些方法的装置、程序 |
JP5754899B2 (ja) | 2009-10-07 | 2015-07-29 | ソニー株式会社 | 復号装置および方法、並びにプログラム |
WO2011045926A1 (ja) * | 2009-10-14 | 2011-04-21 | パナソニック株式会社 | 符号化装置、復号装置およびこれらの方法 |
EP2525354B1 (en) * | 2010-01-13 | 2015-04-22 | Panasonic Intellectual Property Corporation of America | Encoding device and encoding method |
JP5850216B2 (ja) | 2010-04-13 | 2016-02-03 | ソニー株式会社 | 信号処理装置および方法、符号化装置および方法、復号装置および方法、並びにプログラム |
JP5609737B2 (ja) | 2010-04-13 | 2014-10-22 | ソニー株式会社 | 信号処理装置および方法、符号化装置および方法、復号装置および方法、並びにプログラム |
US20130101028A1 (en) * | 2010-07-05 | 2013-04-25 | Nippon Telegraph And Telephone Corporation | Encoding method, decoding method, device, program, and recording medium |
JP6075743B2 (ja) | 2010-08-03 | 2017-02-08 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
JP5707842B2 (ja) | 2010-10-15 | 2015-04-30 | ソニー株式会社 | 符号化装置および方法、復号装置および方法、並びにプログラム |
WO2012144128A1 (ja) | 2011-04-20 | 2012-10-26 | パナソニック株式会社 | 音声音響符号化装置、音声音響復号装置、およびこれらの方法 |
WO2013035257A1 (ja) * | 2011-09-09 | 2013-03-14 | パナソニック株式会社 | 符号化装置、復号装置、符号化方法および復号方法 |
EP2770506A4 (en) * | 2011-10-19 | 2015-02-25 | Panasonic Ip Corp America | CODING DEVICE AND CODING METHOD |
WO2013067465A1 (en) * | 2011-11-04 | 2013-05-10 | Ess Technology, Inc. | Down-conversion of multiple rf channels |
JP6179087B2 (ja) * | 2012-10-24 | 2017-08-16 | 富士通株式会社 | オーディオ符号化装置、オーディオ符号化方法、オーディオ符号化用コンピュータプログラム |
JP6531649B2 (ja) | 2013-09-19 | 2019-06-19 | ソニー株式会社 | 符号化装置および方法、復号化装置および方法、並びにプログラム |
CN105849801B (zh) | 2013-12-27 | 2020-02-14 | 索尼公司 | 解码设备和方法以及程序 |
ES2709329T3 (es) * | 2014-04-25 | 2019-04-16 | Ntt Docomo Inc | Dispositivo de conversión de coeficiente de predicción lineal y procedimiento de conversión de coeficiente de predicción lineal |
FR3049084B1 (fr) * | 2016-03-15 | 2022-11-11 | Fraunhofer Ges Forschung | Dispositif de codage pour le traitement d'un signal d'entree et dispositif de decodage pour le traitement d'un signal code |
US10263765B2 (en) * | 2016-11-09 | 2019-04-16 | Khalifa University of Science and Technology | Systems and methods for low-power single-wire communication |
CN108418612B (zh) * | 2017-04-26 | 2019-03-26 | 华为技术有限公司 | 一种指示及确定预编码向量的方法和设备 |
US11133891B2 (en) | 2018-06-29 | 2021-09-28 | Khalifa University of Science and Technology | Systems and methods for self-synchronized communications |
US10951596B2 (en) * | 2018-07-27 | 2021-03-16 | Khalifa University of Science and Technology | Method for secure device-to-device communication using multilayered cyphers |
US11380345B2 (en) * | 2020-10-15 | 2022-07-05 | Agora Lab, Inc. | Real-time voice timbre style transform |
US11457224B2 (en) * | 2020-12-29 | 2022-09-27 | Qualcomm Incorporated | Interlaced coefficients in hybrid digital-analog modulation for transmission of video data |
US11431962B2 (en) | 2020-12-29 | 2022-08-30 | Qualcomm Incorporated | Analog modulated video transmission with variable symbol rate |
US11553184B2 (en) | 2020-12-29 | 2023-01-10 | Qualcomm Incorporated | Hybrid digital-analog modulation for transmission of video data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0651795A (ja) * | 1992-03-02 | 1994-02-25 | American Teleph & Telegr Co <Att> | 信号量子化装置及びその方法 |
JPH09190198A (ja) * | 1995-09-29 | 1997-07-22 | Rockwell Internatl Corp | 狭い帯域幅チャネルで音声を送信する方法、狭い帯域幅チャネルからデジタル化された音声を受信する方法、および狭い帯域幅チャネルで音声を送信する装置 |
JPH09230898A (ja) * | 1996-02-22 | 1997-09-05 | Nippon Telegr & Teleph Corp <Ntt> | 音響信号変換符号化方法及び復号化方法 |
JP2001255892A (ja) * | 2000-03-13 | 2001-09-21 | Nippon Telegr & Teleph Corp <Ntt> | ステレオ信号符号化方法 |
JP2002091498A (ja) * | 2000-09-19 | 2002-03-27 | Victor Co Of Japan Ltd | オーディオ信号符号化装置 |
JP2002335161A (ja) * | 2001-05-07 | 2002-11-22 | Sony Corp | 信号処理装置及び方法、信号符号化装置及び方法、並びに信号復号装置及び方法 |
JP2003273747A (ja) * | 2001-11-28 | 2003-09-26 | Victor Co Of Japan Ltd | 可変長符号化データ受信方法及び可変長符号化データ受信装置 |
JP2005300778A (ja) | 2004-04-08 | 2005-10-27 | Ricoh Co Ltd | 光走査装置、画像形成装置 |
JP2006272251A (ja) | 2005-03-30 | 2006-10-12 | Monobe Engineering:Kk | ストレーナーシステム |
Family Cites Families (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5684920A (en) * | 1994-03-17 | 1997-11-04 | Nippon Telegraph And Telephone | Acoustic signal transform coding method and decoding method having a high efficiency envelope flattening method therein |
JPH07261797A (ja) * | 1994-03-18 | 1995-10-13 | Mitsubishi Electric Corp | 信号符号化装置及び信号復号化装置 |
US5651090A (en) * | 1994-05-06 | 1997-07-22 | Nippon Telegraph And Telephone Corporation | Coding method and coder for coding input signals of plural channels using vector quantization, and decoding method and decoder therefor |
US5649051A (en) * | 1995-06-01 | 1997-07-15 | Rothweiler; Joseph Harvey | Constant data rate speech encoder for limited bandwidth path |
US5710863A (en) * | 1995-09-19 | 1998-01-20 | Chen; Juin-Hwey | Speech signal quantization using human auditory models in predictive coding systems |
US5956674A (en) * | 1995-12-01 | 1999-09-21 | Digital Theater Systems, Inc. | Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels |
US6119083A (en) * | 1996-02-29 | 2000-09-12 | British Telecommunications Public Limited Company | Training process for the classification of a perceptual signal |
JP3246715B2 (ja) * | 1996-07-01 | 2002-01-15 | 松下電器産業株式会社 | オーディオ信号圧縮方法,およびオーディオ信号圧縮装置 |
US6202046B1 (en) * | 1997-01-23 | 2001-03-13 | Kabushiki Kaisha Toshiba | Background noise/speech classification method |
US6345246B1 (en) * | 1997-02-05 | 2002-02-05 | Nippon Telegraph And Telephone Corporation | Apparatus and method for efficiently coding plural channels of an acoustic signal at low bit rates |
US6167375A (en) * | 1997-03-17 | 2000-12-26 | Kabushiki Kaisha Toshiba | Method for encoding and decoding a speech signal including background noise |
CA2230188A1 (en) * | 1998-03-27 | 1999-09-27 | William C. Treurniet | Objective audio quality measurement |
WO1999050828A1 (en) * | 1998-03-30 | 1999-10-07 | Voxware, Inc. | Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment |
SE9903553D0 (sv) * | 1999-01-27 | 1999-10-01 | Lars Liljeryd | Enhancing percepptual performance of SBR and related coding methods by adaptive noise addition (ANA) and noise substitution limiting (NSL) |
US6691082B1 (en) * | 1999-08-03 | 2004-02-10 | Lucent Technologies Inc | Method and system for sub-band hybrid coding |
US7171355B1 (en) * | 2000-10-25 | 2007-01-30 | Broadcom Corporation | Method and apparatus for one-stage and two-stage noise feedback coding of speech and audio signals |
US6842761B2 (en) * | 2000-11-21 | 2005-01-11 | America Online, Inc. | Full-text relevancy ranking |
JP3404016B2 (ja) * | 2000-12-26 | 2003-05-06 | 三菱電機株式会社 | 音声符号化装置及び音声符号化方法 |
US7200561B2 (en) * | 2001-08-23 | 2007-04-03 | Nippon Telegraph And Telephone Corporation | Digital signal coding and decoding methods and apparatuses and programs therefor |
US6934677B2 (en) * | 2001-12-14 | 2005-08-23 | Microsoft Corporation | Quantization matrices based on critical band pattern information for digital audio wherein quantization bands differ from critical bands |
US7146313B2 (en) * | 2001-12-14 | 2006-12-05 | Microsoft Corporation | Techniques for measurement of perceptual audio quality |
CN1275222C (zh) * | 2001-12-25 | 2006-09-13 | 株式会社Ntt都科摩 | 信号编码装置和信号编码方法 |
US6947886B2 (en) * | 2002-02-21 | 2005-09-20 | The Regents Of The University Of California | Scalable compression of audio and other signals |
US20040002856A1 (en) * | 2002-03-08 | 2004-01-01 | Udaya Bhaskar | Multi-rate frequency domain interpolative speech CODEC system |
WO2003091989A1 (en) * | 2002-04-26 | 2003-11-06 | Matsushita Electric Industrial Co., Ltd. | Coding device, decoding device, coding method, and decoding method |
CA2464408C (en) * | 2002-08-01 | 2012-02-21 | Matsushita Electric Industrial Co., Ltd. | Audio decoding apparatus and method for band expansion with aliasing suppression |
US7054807B2 (en) * | 2002-11-08 | 2006-05-30 | Motorola, Inc. | Optimizing encoder for efficiently determining analysis-by-synthesis codebook-related parameters |
CN1420487A (zh) * | 2002-12-19 | 2003-05-28 | 北京工业大学 | 1kb/s线谱频率参数的一步插值预测矢量量化方法 |
US7349842B2 (en) * | 2003-09-29 | 2008-03-25 | Sony Corporation | Rate-distortion control scheme in audio encoding |
US7613607B2 (en) * | 2003-12-18 | 2009-11-03 | Nokia Corporation | Audio enhancement in coded domain |
TWI231656B (en) * | 2004-04-08 | 2005-04-21 | Univ Nat Chiao Tung | Fast bit allocation algorithm for audio coding |
US7490044B2 (en) * | 2004-06-08 | 2009-02-10 | Bose Corporation | Audio signal processing |
AU2006232364B2 (en) * | 2005-04-01 | 2010-11-25 | Qualcomm Incorporated | Systems, methods, and apparatus for wideband speech coding |
US7539612B2 (en) * | 2005-07-15 | 2009-05-26 | Microsoft Corporation | Coding and decoding scale factor information |
TWI271703B (en) * | 2005-07-22 | 2007-01-21 | Pixart Imaging Inc | Audio encoder and method thereof |
US7953604B2 (en) * | 2006-01-20 | 2011-05-31 | Microsoft Corporation | Shape and scale parameters for extended-band frequency coding |
US9454974B2 (en) * | 2006-07-31 | 2016-09-27 | Qualcomm Incorporated | Systems, methods, and apparatus for gain factor limiting |
US8374857B2 (en) * | 2006-08-08 | 2013-02-12 | Stmicroelectronics Asia Pacific Pte, Ltd. | Estimating rate controlling parameters in perceptual audio encoders |
US7873514B2 (en) * | 2006-08-11 | 2011-01-18 | Ntt Docomo, Inc. | Method for quantizing speech and audio through an efficient perceptually relevant search of multiple quantization patterns |
-
2006
- 2006-10-13 JP JP2007540000A patent/JP4954080B2/ja not_active Expired - Fee Related
- 2006-10-13 WO PCT/JP2006/320457 patent/WO2007043648A1/ja active Application Filing
- 2006-10-13 BR BRPI0617447-7A patent/BRPI0617447A2/pt not_active IP Right Cessation
- 2006-10-13 CN CN2006800375449A patent/CN101283407B/zh not_active Expired - Fee Related
- 2006-10-13 CN CN2012100616620A patent/CN102623014A/zh active Pending
- 2006-10-13 KR KR1020087008677A patent/KR20080047443A/ko not_active Application Discontinuation
- 2006-10-13 EP EP06821860A patent/EP1953737B1/en not_active Not-in-force
- 2006-10-13 US US12/089,985 patent/US8135588B2/en active Active
- 2006-10-13 RU RU2008114382/09A patent/RU2008114382A/ru not_active Application Discontinuation
-
2012
- 2012-02-07 US US13/367,840 patent/US8311818B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0651795A (ja) * | 1992-03-02 | 1994-02-25 | American Teleph & Telegr Co <Att> | 信号量子化装置及びその方法 |
JPH09190198A (ja) * | 1995-09-29 | 1997-07-22 | Rockwell Internatl Corp | 狭い帯域幅チャネルで音声を送信する方法、狭い帯域幅チャネルからデジタル化された音声を受信する方法、および狭い帯域幅チャネルで音声を送信する装置 |
JPH09230898A (ja) * | 1996-02-22 | 1997-09-05 | Nippon Telegr & Teleph Corp <Ntt> | 音響信号変換符号化方法及び復号化方法 |
JP2001255892A (ja) * | 2000-03-13 | 2001-09-21 | Nippon Telegr & Teleph Corp <Ntt> | ステレオ信号符号化方法 |
JP2002091498A (ja) * | 2000-09-19 | 2002-03-27 | Victor Co Of Japan Ltd | オーディオ信号符号化装置 |
JP2002335161A (ja) * | 2001-05-07 | 2002-11-22 | Sony Corp | 信号処理装置及び方法、信号符号化装置及び方法、並びに信号復号装置及び方法 |
JP2003273747A (ja) * | 2001-11-28 | 2003-09-26 | Victor Co Of Japan Ltd | 可変長符号化データ受信方法及び可変長符号化データ受信装置 |
JP2005300778A (ja) | 2004-04-08 | 2005-10-27 | Ricoh Co Ltd | 光走査装置、画像形成装置 |
JP2006272251A (ja) | 2005-03-30 | 2006-10-12 | Monobe Engineering:Kk | ストレーナーシステム |
Non-Patent Citations (3)
Title |
---|
"Everything about MPEG-4", 30 September 1998, KOGYO CHOSAKAI PUBLISHING, INC., pages: 126 - 127 |
NAOKI IWAKAMI ET AL.: "Audio Coding Using Transform-Domain Weighted Interleave Vector Quantization (TwinVQ", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J80-A, no. 5, May 1997 (1997-05-01), pages 830 - 837 |
See also references of EP1953737A4 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011058752A1 (ja) * | 2009-11-12 | 2011-05-19 | パナソニック株式会社 | 符号化装置、復号装置およびこれらの方法 |
US8838443B2 (en) | 2009-11-12 | 2014-09-16 | Panasonic Intellectual Property Corporation Of America | Encoder apparatus, decoder apparatus and methods of these |
WO2012005212A1 (ja) * | 2010-07-05 | 2012-01-12 | 日本電信電話株式会社 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、及び記録媒体 |
JP5337305B2 (ja) * | 2010-07-05 | 2013-11-06 | 日本電信電話株式会社 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、及び記録媒体 |
US8711012B2 (en) | 2010-07-05 | 2014-04-29 | Nippon Telegraph And Telephone Corporation | Encoding method, decoding method, encoding device, decoding device, program, and recording medium |
US9361892B2 (en) | 2010-09-10 | 2016-06-07 | Panasonic Intellectual Property Corporation Of America | Encoder apparatus and method that perform preliminary signal selection for transform coding before main signal selection for transform coding |
WO2012032759A1 (ja) * | 2010-09-10 | 2012-03-15 | パナソニック株式会社 | 符号化装置及び符号化方法 |
CN103069483A (zh) * | 2010-09-10 | 2013-04-24 | 松下电器产业株式会社 | 编码装置以及编码方法 |
JP5679470B2 (ja) * | 2010-09-10 | 2015-03-04 | パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America | 符号化装置及び符号化方法 |
WO2013051210A1 (ja) * | 2011-10-07 | 2013-04-11 | パナソニック株式会社 | 符号化装置及び符号化方法 |
JPWO2013051210A1 (ja) * | 2011-10-07 | 2015-03-30 | パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America | 符号化装置及び符号化方法 |
US9558752B2 (en) | 2011-10-07 | 2017-01-31 | Panasonic Intellectual Property Corporation Of America | Encoding device and encoding method |
JP2019152878A (ja) * | 2011-11-03 | 2019-09-12 | ヴォイスエイジ・コーポレーション | 時間領域デコーダによって復号化された時間領域励振の一般のオーディオ合成物を修正するための方法および装置 |
Also Published As
Publication number | Publication date |
---|---|
CN102623014A (zh) | 2012-08-01 |
EP1953737A1 (en) | 2008-08-06 |
EP1953737A4 (en) | 2011-11-09 |
US20090281811A1 (en) | 2009-11-12 |
CN101283407A (zh) | 2008-10-08 |
US8311818B2 (en) | 2012-11-13 |
US8135588B2 (en) | 2012-03-13 |
EP1953737B1 (en) | 2012-10-03 |
JP4954080B2 (ja) | 2012-06-13 |
US20120136653A1 (en) | 2012-05-31 |
JPWO2007043648A1 (ja) | 2009-04-16 |
KR20080047443A (ko) | 2008-05-28 |
CN101283407B (zh) | 2012-05-23 |
RU2008114382A (ru) | 2009-10-20 |
BRPI0617447A2 (pt) | 2012-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2007043648A1 (ja) | 変換符号化装置および変換符号化方法 | |
KR102240271B1 (ko) | 대역폭 확장신호 생성장치 및 방법 | |
KR101213840B1 (ko) | 복호화 장치 및 복호화 방법, 및 복호화 장치를 구비하는 통신 단말 장치 및 기지국 장치 | |
JP4954069B2 (ja) | ポストフィルタ、復号化装置及びポストフィルタ処理方法 | |
JP5328368B2 (ja) | 符号化装置、復号装置、およびこれらの方法 | |
JP4861196B2 (ja) | Acelp/tcxに基づくオーディオ圧縮中の低周波数強調の方法およびデバイス | |
RU2471252C2 (ru) | Устройство кодирования и способ кодирования | |
JP5247826B2 (ja) | 復号化音調音響信号を増強するためのシステムおよび方法 | |
JP6980871B2 (ja) | 信号符号化方法及びその装置、並びに信号復号方法及びその装置 | |
US20070147518A1 (en) | Methods and devices for low-frequency emphasis during audio compression based on ACELP/TCX | |
WO2007037361A1 (ja) | 音声符号化装置および音声符号化方法 | |
WO2008072737A1 (ja) | 符号化装置、復号装置およびこれらの方法 | |
WO2010127617A1 (en) | Methods for receiving digital audio signal using processor and correcting lost data in digital audio signal | |
WO2006041055A1 (ja) | スケーラブル符号化装置、スケーラブル復号装置及びスケーラブル符号化方法 | |
JPWO2008084688A1 (ja) | 符号化装置、復号装置及びこれらの方法 | |
EP2571170B1 (en) | Encoding method, decoding method, encoding device, decoding device, program, and recording medium | |
US20100280830A1 (en) | Decoder | |
RU2464650C2 (ru) | Устройство и способ кодирования, устройство и способ декодирования | |
KR20160098597A (ko) | 통신 시스템에서 신호 코덱 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200680037544.9 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 2007540000 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12089985 Country of ref document: US Ref document number: 2008114382 Country of ref document: RU Ref document number: 1020087008677 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 717/MUMNP/2008 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006821860 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: PI0617447 Country of ref document: BR Kind code of ref document: A2 Effective date: 20080414 |