WO2006070751A1 - 音声符号化装置および音声符号化方法 - Google Patents
音声符号化装置および音声符号化方法 Download PDFInfo
- Publication number
- WO2006070751A1 WO2006070751A1 PCT/JP2005/023802 JP2005023802W WO2006070751A1 WO 2006070751 A1 WO2006070751 A1 WO 2006070751A1 JP 2005023802 W JP2005023802 W JP 2005023802W WO 2006070751 A1 WO2006070751 A1 WO 2006070751A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- channel
- monaural
- prediction
- speech
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 14
- 239000010410 layer Substances 0.000 claims abstract description 21
- 239000012792 core layer Substances 0.000 claims abstract description 14
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 11
- 230000015572 biosynthetic process Effects 0.000 claims description 42
- 238000003786 synthesis reaction Methods 0.000 claims description 42
- 238000004891 communication Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 abstract description 43
- 230000005284 excitation Effects 0.000 description 67
- 230000003044 adaptive effect Effects 0.000 description 42
- 239000013598 vector Substances 0.000 description 40
- 238000013139 quantization Methods 0.000 description 38
- 238000004458 analytical method Methods 0.000 description 34
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 238000007373 indentation Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
Definitions
- the present invention relates to a speech coding apparatus and speech coding method, and more particularly to a speech coding apparatus and speech coding method for stereo speech.
- a voice coding scheme having a scalable configuration is desired in order to control traffic on the network and realize multicast communication.
- a scalable configuration refers to a configuration in which audio data can be decoded even from partial encoded data on the receiving side.
- Non-Patent Document 1 Ramprashad, SA, "Stereophonic CELP coding using cross channel p rediction", Pro IEEE Workshop on Speech Coding, pp.136-138, Sep. 2000.
- Non-Patent Document 1 when the correlation between both channels is small, the prediction performance (prediction gain) between the channels decreases, and the coding efficiency is low. to degrade.
- An object of the present invention is a speech coding having a monaural stereo's scalable configuration, in which speech that can efficiently encode stereo speech even when a correlation between a plurality of stereo signals is small.
- An encoding device and a speech encoding method are provided.
- the speech coding apparatus includes a first coding unit that performs coding using a monaural signal of a core layer, a second coding unit that performs coding using a stereo signal of an enhancement layer, And the first encoding means includes generation means for generating a monaural signal from the first channel signal and the second channel signal using a stereo signal including the first channel signal and the second channel signal as an input signal. And the second encoding means includes a synthesizing means for synthesizing the predicted signal of the first channel signal or the second channel signal based on a signal obtained from the monaural signal.
- stereo sound can be efficiently encoded even when the correlation between a plurality of channel signals of a stereo signal is small.
- FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a block diagram showing the configuration of the lch and 2ch prediction signal synthesizers according to Embodiment 1 of the present invention.
- FIG. 3 is a block diagram showing the configuration of the lch and 2ch prediction signal synthesizers according to Embodiment 1 of the present invention. Lock figure
- FIG. 4 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 1 of the present invention.
- FIG. 5 is an operation explanatory diagram of the speech coding apparatus according to Embodiment 1 of the present invention.
- FIG. 6 is an operation explanatory diagram of the speech coding apparatus according to Embodiment 1 of the present invention.
- FIG. 7 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 2 of the present invention.
- FIG. 8 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 2 of the present invention.
- FIG. 9 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 3 of the present invention.
- FIG. 10 is a block diagram showing the configuration of the lch and 2ch CELP coding sections according to Embodiment 3 of the present invention.
- FIG. 11 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 3 of the present invention.
- FIG. 12 is a block diagram showing a configuration of lch and 2ch CELP decoding sections according to Embodiment 3 of the present invention.
- FIG. 13 is an operation flowchart of the speech coding apparatus according to Embodiment 3 of the present invention.
- FIG. 14 is an operation flow diagram of the lch and second ch CELP code keys according to Embodiment 3 of the present invention.
- FIG. 15 is a block diagram showing another configuration of the speech coding apparatus according to Embodiment 3 of the present invention.
- FIG. 16 is a block diagram showing another configuration of the lch and 2ch CELP code key sections according to Embodiment 3 of the present invention.
- FIG. 17 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 4 of the present invention.
- FIG. 18 is a block diagram showing the configuration of the lch and 2ch CELP coding sections according to Embodiment 4 of the present invention.
- FIG. 1 shows the configuration of the speech coding apparatus according to the present embodiment.
- Speech coding apparatus 100 shown in FIG. 1 includes a core layer coding unit 110 for monaural signals and an enhancement layer coding unit 120 for stereo signals. In the following description, the operation is assumed to be performed in units of frames.
- the monaural signal encoding unit 112 performs encoding on the monaural signal s_mono (n), and outputs the monaural signal encoding signal data to the monaural signal decoding unit 113. Also, the encoded data of the monaural signal is multiplexed with the quantized code or encoded data output from enhancement layer encoding section 120 and transmitted to the speech decoding apparatus as encoded data.
- the monaural signal decoding unit 113 generates a monaural decoded signal from the monaural signal code key data and outputs the monaural decoded signal to the enhancement layer code key unit 120.
- lch prediction filter analysis section 121 obtains and quantizes the lch prediction filter parameter from lch speech signal s_chl (n) and the monaural decoded signal, and performs the first ch prediction.
- the filter quantization parameter is output to the first channel predicted signal synthesis unit 122.
- the monaural signal s_mono (n) that is the output of the monaural signal generation unit 111 may be used as the input to the lch prediction filter analysis unit 121 instead of the monaural decoded signal.
- the l-th channel prediction filter analysis unit 121 outputs an l-th channel prediction filter quantization code obtained by encoding the l-th channel prediction filter quantization parameter. This lch prediction filter quantized code is multiplexed with other encoded data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- First lch prediction signal combining section 122 combines the first decoded signal from the monaural decoded signal and the first ch prediction filter quantization parameter, and outputs the first ch prediction signal to subtractor 123. Details of the lch prediction signal synthesis unit 122 will be described later.
- the subtracter 123 is the difference between the lch speech signal as the input signal and the lch prediction signal, that is, the signal of the residual component of the lch prediction signal relative to the lch input speech signal (the lch prediction residual). Difference signal) is obtained and output to the l-th prediction residual signal sign key unit 124.
- the lch prediction residual signal encoding unit 124 encodes the lch prediction residual signal to generate the lch Prediction residual encoded data is output.
- This lch prediction residual encoded data is multiplexed with other encoded data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- the second channel prediction filter analysis unit 125 obtains and quantizes the second channel prediction filter parameter from the second channel speech signal s_ch2 (n) and the monaural decoded signal, and quantizes the second channel prediction filter quantum parameter.
- the prediction signal synthesis unit 126 outputs the result.
- the second channel prediction filter analyzing unit 125 outputs a second channel prediction filter quantization code obtained by encoding the second channel prediction filter quantization parameter. This second channel predictive filter quantized code is multiplexed with other code data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- Second channel prediction signal synthesis section 126 synthesizes the second channel prediction signal from the monaural decoded signal and the second channel prediction filter quantization parameter, and outputs the second channel prediction signal to subtractor 127. Details of the second channel predicted signal synthesis unit 126 will be described later.
- the subtractor 127 is the difference between the second channel speech signal that is the input signal and the second channel predicted signal, that is, the signal of the residual component of the second channel predicted signal relative to the second channel input speech signal (second channel predicted residual). Difference signal) and output it to the second channel prediction residual signal sign key unit 128.
- Second channel prediction residual signal encoding unit 128 encodes the second channel prediction residual signal and outputs second channel prediction residual encoded data.
- This second channel prediction residual encoded data is multiplexed with other encoded data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- the details of the lch prediction signal synthesizer 122 and the 2ch prediction signal synthesizer 126 will be described.
- the configurations of the l-ch predicted signal synthesizer 122 and the second-ch predicted signal synthesizer 126 are as shown in FIG. 2 ⁇ Configuration example 1> or FIG. 3 ⁇ Configuration example 2>.
- the delay difference of each channel signal relative to the monaural signal is based on the correlation between the monaural signal that is the sum of the lch input signal and the 2nd channel input signal and each channel signal. (D samples) and amplitude ratio (g) are used as prediction filter quantization parameters to synthesize prediction signals for each channel from monaural signals.
- the signal synthesizer 126 includes a delay unit 201 and a multiplier 202, and synthesizes a prediction signal sp_ch (n) of each channel from the monaural decoded signal sd_mono (n) by the prediction expressed by the equation (2).
- the configuration shown in FIG. 2 is further provided with delay devices 203-1 to P, multipliers 204-1 to P, and an adder 205.
- the prediction coefficient sequence ⁇ a (0), a (l), a (2) , ⁇ , a (P) ⁇ (P is the prediction order, a (0) 1.0) Synthesize the signal sp_ch (n).
- sp_ch (n) ⁇ ⁇ g * a (k) ⁇ sd_raono (n- ⁇ -k) ⁇ ... (3)
- the prediction filter quantization parameter obtained by quantizing is output to the l-ch predicted signal synthesis unit 122 and the second-ch predicted signal synthesis unit 126 having the above configuration.
- the l-th channel prediction filter analysis unit 121 and the 2nd channel prediction filter analysis unit 125 output a prediction filter quantization code obtained by encoding a prediction filter quantization parameter.
- the lch prediction filter analysis unit 121 and the 2ch prediction filter analysis unit 125 have a correlation between the monaural decoded signal and the input audio signal of each channel. Is a prediction filter for the delay difference D and the ratio of average amplitude per frame g Get it as a parameter.
- Speech decoding apparatus 300 shown in FIG. 4 includes core layer decoding section 310 for monaural signals and enhancement layer decoding section 320 for stereo signals.
- the monaural signal decoding unit 311 decodes the encoded data of the input monaural signal, outputs the monaural decoded signal to the enhancement layer decoding unit 320, and outputs it as the final output.
- the lch prediction filter decoding unit 321 decodes the input lch prediction filter quantization code and outputs the lch prediction filter quantization parameter to the lch prediction signal synthesis unit 322.
- the lch predicted signal synthesizer 322 has the same configuration as that of the lch predicted signal synthesizer 122 of the speech coder 100, and the lch speech signal is derived from the monaural decoded signal and the lch predictive filter quantization parameter. And the l-th channel predicted speech signal is output to the adder 324.
- lch prediction residual signal decoding section 323 decodes the input lch prediction residual codeh data and outputs the lch prediction residual signal to adder 324.
- Adder 324 adds the l-ch predicted speech signal and the l-ch predicted residual signal to obtain a decoded signal of l-ch, and outputs it as the final output.
- second channel prediction filter decoding section 325 decodes the input second channel prediction filter quantization code and outputs the second channel prediction filter quantization parameter to second channel prediction signal synthesis section 326.
- Second channel predicted signal synthesis section 326 adopts the same configuration as second channel predicted signal synthesis section 126 of speech encoding apparatus 100, and outputs the second channel speech signal from the monaural decoded signal and the second channel prediction filter quantization parameter. Predict and output the second channel predicted speech signal to adder 328.
- Second channel prediction residual signal decoding section 327 decodes the input second channel prediction residual code data and outputs the second channel prediction residual signal to adder 328.
- Adder 328 adds the second channel predicted speech signal and the second channel predicted residual signal to obtain a second channel decoded signal, and outputs it as the final output.
- a decoded signal obtained only from the code signal data of the monaural signal is monaurally decoded.
- the first channel decoded signal and second channel decoded signal are decoded and output using all of the received encoded data and quantized code.
- the monaural signal according to the present embodiment is a signal obtained by adding the 1st ch audio signal s_chl and the 2nd ch audio signal s_ch2, and therefore both channels are used.
- This is an intermediate signal including the signal components. Therefore, even if the inter-channel correlation between the 1st channel audio signal and the 2nd channel audio signal is small, the correlation between the 1st channel audio signal and the monaural signal and the correlation between the 2nd channel audio signal and the monaural signal are more Is expected to grow. Therefore, the prediction gain and monaural signal power when predicting the monaural signal power 1st channel audio signal and the prediction gain when predicting the 2nd channel audio signal (Fig.
- prediction gain B are calculated from the 1st channel audio signal to the 2nd channel audio signal. Prediction gain when predicting signal and 2nd channel audio signal strength Expected to be larger than the prediction gain when predicting lch audio signal (Fig. 5: Prediction gain A).
- Fig. 6 summarizes this relationship. That is, when the inter-channel correlation between the 1st channel audio signal and the 2nd channel audio signal is sufficiently large, the prediction gain A and the prediction gain B do not change so much, and both values are sufficiently large. However, when the inter-channel correlation between the 1st channel audio signal and the 2nd channel audio signal is small, the prediction gain A decreases more rapidly than when the inter-channel correlation is sufficiently large, whereas the prediction gain B is Expected to be a value greater than the predicted gain A, which is less reduced than the gain A.
- the signals of the respective channels are predicted and synthesized from the monaural signal that is an intermediate signal including the signal components of both the lch audio signal and the 2ch audio signal.
- a signal having a larger prediction gain than conventional signals can be combined with a signal of a plurality of channels having a small inter-channel correlation.
- equivalent sound quality can be obtained by encoding at a lower bit rate, and higher sound quality speech can be obtained at an equivalent bit rate. Therefore, according to the present embodiment, it is possible to improve the code efficiency.
- FIG. 7 shows the configuration of speech encoding apparatus 400 according to the present embodiment.
- speech coding apparatus 400 has the configuration shown in FIG. 1 (Embodiment 1), second channel prediction filter analysis unit 125, second channel prediction signal synthesis unit 126, subtractor 127, and second channel prediction.
- the configuration is such that the residual signal encoding unit 128 is removed. That is, speech coding apparatus 400 synthesizes the prediction signal only for lch of lch and 2ch, and encodes the monaural signal encoded data, lch prediction filter quantized code, and lch prediction residual. Only the encoded data is transmitted to the speech decoding device.
- speech decoding apparatus 500 has second channel prediction filter decoding section 325, second channel prediction signal synthesis section 326, and second channel prediction residual signal decoding section from the configuration shown in FIG. 4 (Embodiment 1). 3 27 and adder 328 are removed, and instead, the second channel decoded signal synthesizer 331 is added.
- Second channel decoded signal synthesizer 331 uses the monaural decoded signal sd_mono (n) and the first channel decoded signal sd_chl (n), based on the relationship shown in equation (1), according to equation (5). 2ch decoded signal sd_ch 2 (n) is synthesized.
- enhancement layer encoding section 120 is configured to process only the 1st channel, but may be configured to process only the 2nd channel instead of the 1st channel.
- the apparatus configuration can be simplified as compared with the first embodiment.
- the encoding efficiency is further improved.
- FIG. 9 shows the configuration of speech encoding apparatus 600 according to the present embodiment.
- the core layer coding unit 110 includes a monaural signal generation unit 111 and a monaural signal CELP coding unit 114
- the enhancement layer coding unit 120 includes a monaural driving excitation signal holding unit 131, an Ich CELP coding unit 132, and a second ch CELP.
- An encoding unit 133 is provided.
- the CELP encoding unit 114 includes the monaural signal generated by the monaural signal generation unit 111.
- the CELP code is applied to the signal s_mono (n), and the monaural signal encoded data and the monaural driving sound signal obtained by the CELP code are output.
- the monaural driving sound source signal is held in the monaural driving sound source signal holding unit 131.
- Ich CELP encoding section 132 performs CELP encoding on the lch audio signal and outputs lch encoded data.
- the second ch CELP code key unit 133 performs CELP coding on the second ch audio signal and outputs second ch code data.
- the IchCELP encoding unit 132 and the second chCELP encoding unit 133 use the monaural driving excitation signal held in the monaural driving excitation signal holding unit 131 to predict the driving excitation signal corresponding to the input audio signal of each channel.
- the Nth channel (N is 1 or 2) LPC analysis unit 401 performs LPC analysis on the Nth channel speech signal, quantizes the obtained LPC parameters, and performs the Nth channel LPC prediction residual.
- the Nth LPC quantization code is output.
- the Nth LPC analysis unit 401 uses the fact that the correlation between the LPC parameter for the monaural signal and the LPC parameter (Nth chLPC parameter) obtained from the Nth channel audio signal is large when the LPC parameter is quantized. Encoded data Force Monaural signal quantization LPC parameters are decoded, and efficient quantization is performed by quantizing the difference component of the NchLPC parameters for the monaural signal quantization LPC parameters.
- Nth channel LPC prediction residual signal generation section 402 calculates an LPC prediction residual signal for the Nth channel speech signal using the Nth channel quantization LPC parameter, and outputs the LPC prediction residual signal to Nth channel prediction filter analysis section 403.
- Nth channel prediction filter analysis unit 403 obtains and quantizes the Nth channel prediction filter parameter from the LPC prediction residual signal and the monaural driving excitation signal, and quantizes the Nth channel prediction filter quantization parameter. Output to 404 and the Nth channel prediction file Output a quantized code.
- N-th channel excitation signal synthesizer 404 synthesizes a predicted drive source signal corresponding to the N-th channel audio signal using the monaural drive source signal and the N-th channel predictive filter quantization parameter to generate a multiplier 407— Output to 1.
- Nth channel prediction filter analysis unit 403 corresponds to first channel prediction filter analysis unit 121 and second channel prediction filter analysis unit 125 in Embodiment 1 (FIG. 1), and their configuration and operation are as follows. It will be the same.
- N-channel drive excitation signal synthesizer 404 corresponds to l-ch predicted signal synthesizer 122 and second-ch predicted signal synthesizer 126 in Embodiment 1 (FIGS. 1 to 3), and their configuration and operation are the same.
- the prediction of the monaural decoded signal is not performed and the prediction signal of each channel is not synthesized, but the prediction of the monaural driving sound source signal corresponding to the monaural signal is performed and the prediction driving sound source signal of each channel is determined. It differs from the first embodiment in the point of synthesis.
- the excitation signal of the residual component (error component that cannot be predicted) for the predicted driving excitation signal is encoded by excitation search using the CELP code.
- the lch and 2ch ch CELP encoding sections 132 and 133 have an Nch adaptive codebook 405 and an Nch fixed codebook 406, and predict the adaptive excitation, fixed excitation, and monaural driving sound source signal power.
- Each sound source signal of the predictive driving sound source is multiplied by the gain of each, and calorie calculation is performed.
- a closed sound source search is performed for the driving sound source obtained by the addition by minimizing distortion.
- the gain code for the adaptive excitation index, fixed excitation index, adaptive excitation, fixed excitation, and predicted drive excitation signal is output as the Nth channel excitation encoded data. More specifically, it is as follows.
- the synthesis finalizer 409 uses the quantized LPC parameter output from the N-th channel LPC analysis unit 401 to generate the excitation vector generated by the N-th channel adaptive codebook 405 and the N-th channel fixed codebook 406, and the N-th channel drive
- the sound source signal synthesis unit 404 performs synthesis using the LPC synthesis filter using the predicted drive source signal synthesized as the drive source.
- the component corresponding to the Nch predicted driving sound source signal is obtained from the 1st ch predicted signal synthesizer 122 or the 2nd ch predicted signal synthesizer 126 in the first embodiment (Figs .:! To 3). Corresponds to the output prediction signal of each channel.
- the synthesized signal thus obtained is subtracted. Is output to the device 410.
- Subtractor 410 calculates an error signal by subtracting the synthesized signal output from synthesis filter 409 from the N-th channel audio signal, and outputs this error signal to auditory weighting section 411. This error signal corresponds to coding distortion.
- Auditory weighting section 411 performs auditory weighting on the sign distortion output from subtractor 410 and outputs the result to distortion minimizing section 412.
- Distortion minimizing section 412 determines, for Nth channel adaptive codebook 405 and Nch fixed codebook 406, an index that minimizes the coding distortion output from perceptual weighting section 411, and It indicates the index used by Nch adaptive codebook 405 and Nch fixed codebook 406. Also, the distortion minimizing section 412 has gains corresponding to those indentations, specifically, each gain (adaptive codebook) for the adaptive vector from the Nth channel adaptive codebook 405 and the fixed vector of the Nth channel fixed codebook 406 force. Gain and fixed codebook gain) are output to multipliers 407-2 and 407-4, respectively.
- the distortion minimizing unit 412 uses the predicted driving sound source signal output from the N-th channel driving sound source signal synthesizing unit 404, the adaptive rule and the multiplier 407- after gain multiplication in the multiplier 407-2.
- Each gain that adjusts the gain between the three types of signals of the fixed vector after gain multiplication in 4 is generated and output to multipliers 407-1, 407-3, and 407-5, respectively.
- the three types of gains that adjust the gain between these three types of signals are preferably generated with their relationship to each other.
- the contribution of the predicted driving sound source signal is compared to the contribution of the adaptive vector no after gain multiplication and the contribution of the fixed vector after gain multiplication.
- the contribution of the predicted driving sound source signal is relatively relative to the contribution of the adaptive vector after gain multiplication and the contribution of the fixed betaton after gain multiplication. Make it smaller.
- distortion minimizing section 412 outputs these gains, the codes of the gains corresponding to those indentations, and the codes of the inter-signal adjustment gain as the Nth channel excitation code key data.
- the N-th channel adaptive codebook 405 is the sound of the driving sound source to the synthesis filter 409 generated in the past.
- the source vector is stored in the internal buffer, and one sub-routine is stored from the stored excitation vector based on the adaptive codebook lag (pitch lag or pitch period) corresponding to the index specified by the distortion minimizing unit 412. Frames are generated and output to multiplier 407-2 as adaptive codebook vectors.
- N-th channel fixed codebook 406 outputs the excitation vector corresponding to the instructed from distortion minimizing section 412 to multiplier 407-4 as a fixed codebook vector.
- Multiplier 407-2 multiplies the adaptive codebook vector output from N-th channel adaptive codebook 405 by the adaptive codebook gain, and outputs the result to multiplier 407-3.
- Multiplier 407-4 multiplies the fixed codebook vector output from N-th channel fixed codebook 406 by a fixed codebook gain, and outputs the result to multiplier 407-5.
- Multiplier 407-1 multiplies the predicted driving sound source signal output from N-th channel driving sound source signal combining section 404 by the gain, and outputs the result to adder 408.
- Multiplier 407-3 multiplies the adaptive beta after gain multiplication in multiplier 407-2 by another gain and outputs the result to adder 408.
- Multiplier 407-5 multiplies the fixed vector after gain multiplication in multiplier 407-4 by another gain and outputs the result to adder 408.
- the adder 408 includes the predicted driving excitation signal output from the multiplier 407-1 and the multiplier 407.
- the adaptive codebook vector output from 3 and the fixed codebook vector output from multiplier 407-5 are added, and the added excitation vector is output to synthesis filter 409 as the driving excitation.
- the synthesis filter 409 uses the excitation vector output from the adder 408 as a driving excitation LP
- a series of processes in which encoding distortion is calculated using the sound source vectors generated by the N-th channel adaptive codebook 405 and the N-th channel fixed codebook 406 is a closed loop, and a distortion minimizing unit 412 determines and outputs the indexes of the N-th channel adaptive codebook 405 and the N-th channel fixed codebook 406 so that the code distortion is minimized.
- FIG. 11 shows the configuration of speech decoding apparatus 700 according to the present embodiment.
- Speech decoding apparatus 700 shown in FIG. 11 includes core layer decoding section 310 for monaural signals and enhancement layer decoding section 320 for stereo signals.
- Monaural decoding unit 312 performs CELP decoding on encoded data of the input monaural signal, and outputs a monaural decoded signal and a monaural driving excitation signal obtained by CELP decoding.
- the monaural driving sound source signal is held in the monaural driving sound source signal holding unit 341.
- Ich CELP decoding section 342 performs CELP decoding on the lch encoded data and outputs the lch decoded signal.
- Second channel CELP decoding section 343 performs CELP decoding on the second channel encoded data and outputs a second channel decoded signal.
- the Ich CELP decoding unit 342 and the second ch CELP decoding unit 343 use the monaural driving excitation signal held in the monaural driving excitation signal holding unit 341 to predict driving excitation signals corresponding to the encoded data of each channel, and CELP decoding is performed on the prediction residual component.
- speech decoding apparatus 700 having such a configuration, in a monaural-stereo scalable configuration, when the output speech is monaural, a decoded signal obtained only from the code signal data of the monaural signal is monaurally decoded. When output as a signal and the output sound is stereo, the first channel decoded signal and the second channel decoded signal are decoded and output using all of the received encoded data.
- IchCELP decoding section 342 and second chCELP decoding section 343 The configuration of the IchCELP decoding unit 342 and the second chCELP decoding unit 343 is shown in FIG.
- the 1st ch and 2nd ch CELP decoding units 342 and 343 convert the Nth channel LPC quantization from the monaural signal encoded data and Nth channel encoded data (N is 1 or 2) transmitted from the speech encoding device 600 (FIG. 9). Decodes the CELP sound source signal including the parameters and the prediction signal of the Nth channel driving sound source signal, and outputs the Nth channel decoded signal. More specifically, it is as follows.
- N-th channel LPC parameter decoding section 501 uses the monaural signal quantization LPC parameter decoded using the monaural signal encoded data and the N-th channel LPC quantization code to The LPC quantization parameter is decoded, and the obtained quantization LPC parameter is output to the synthesis filter 508.
- Nth channel prediction filter decoding section 502 decodes the Nth channel prediction filter quantization code, and outputs the obtained Nth channel prediction filter quantization parameter to Nth channel excitation signal synthesis unit 503.
- N-th channel excitation signal synthesizer 503 uses the monaural excitation source signal and the N-th channel predictive filter quantization parameter to synthesize a predicted excitation source signal corresponding to the N-th channel audio signal and to multiply multiplier 506- Output to 1.
- Synthesis finalizer 508 uses the quantized LPC parameters output from N-th LchLPC parameter decoding section 501 to generate excitation vectors generated in N-th adaptive codebook 504 and N-ch fixed codebook 505, and Nch drive excitation signal synthesis unit 503 performs synthesis using an LPC synthesis filter using the predicted excitation signal synthesized by the 503 as a drive excitation.
- the obtained synthesized signal is output as the Nth channel decoded signal.
- Nch adaptive codebook 504 stores the sound source vector of the driving excitation to synthesis filter 508 generated in the past in the internal buffer, and corresponds to the status included in the Nch excitation code data. Based on the adaptive codebook lag (pitch lag or pitch period), one subframe is generated from the stored excitation vector and output to the multiplier 506-2 as the adaptive codebook vector.
- Nth channel fixed codebook 505 outputs the excitation vector corresponding to the status included in the Nth channel excitation code key data to multiplier 506-4 as a fixed codebook vector.
- Multiplier 506-2 multiplies the adaptive codebook vector output from Nth channel adaptive codebook 504 by the adaptive codebook gain included in the Nth channel excitation coded data, and outputs the result to multiplier 506-3. .
- Multiplier 506-4 multiplies the fixed codebook vector output from Nth channel fixed codebook 505 by the fixed codebook gain included in the Nth channel excitation code data, and outputs the result to multiplier 506-5. .
- Multiplier 506-1 adjusts the predicted drive excitation signal included in the Nth channel excitation encoded data in the predicted drive sound source signal output from Nth channel drive excitation signal synthesis section 503. Multiply the gain for output and output to adder 507.
- Multiplier 506-3 multiplies the adaptive vector after gain multiplication in multiplier 506-2 by the adjustment gain for the adaptive extra included in the Nth channel sound source encoded data, and adds adder 507. Output to.
- Multiplier 506-5 multiplies the fixed vector after gain multiplication in multiplier 506-4 by the adjustment gain for the fixed outer band included in the Nth channel sound source encoded data, and adds an adder 507. Output to.
- Adder 507 includes a prediction drive excitation signal output from multiplier 506-1, an adaptive codebook vector output from multiplier 506_3, and a fixed codebook output from multiplier 506-5. The vector is added and the added sound source vector is output to the synthesis filter 508 as a drive sound source.
- the synthesis finalizer 508 performs synthesis by the LPC synthesis filter using the sound source vector output from the adder 507 as a drive sound source.
- FIG. 13 shows a summary of the operation flow of the speech encoding apparatus 600 described above.
- a monaural signal is generated from the 1st channel audio signal and the 2nd channel audio signal (ST1301)
- the CELP encoding of the core layer is performed on the monaural signal (ST1302)
- the 1st channel CELP encoding is performed.
- the second channel CELP encoding is performed (ST1303, 1304).
- FIG. 14 shows a summary of the operation flows of the lch and 2ch chLP coding sections 132 and 133. That is, first, LPC analysis of the Nth channel and LPC parameter quantization are performed (ST1401), and then an LPC prediction residual signal of the Nth channel is generated (ST1402). Next, the Nth channel prediction filter is analyzed (ST1403), and the Nth channel driving sound source signal is predicted (ST1404). Finally, the search for the Nth channel driving sound source and the gain are performed (ST1405).
- the prediction filter parameters are obtained by the Nth channel prediction filter analysis unit 403 prior to excitation coding by excitation search in CELP coding.
- a separate codebook for the prediction filter parameters is provided, and in CELP excitation search, along with searches such as adaptive excitation search, the optimal prediction filter parameters are determined based on the codebook by closed loop search by distortion minimization. It may be configured as desired.
- the N-th channel prediction filter analysis unit 403 obtains a plurality of prediction filter parameter candidates, and selects an optimal prediction filter parameter from the plurality of candidates by a closed loop type search by distortion minimization in CELP sound source search. It is good also as such a structure. By adopting such a configuration, more optimal filter parameters can be calculated, and prediction performance can be improved (that is, decoded speech quality can be improved).
- each gain is adjusted to multiply each signal to adjust the gain between the three types of signals.
- the gain may be multiplied only for the predicted driving sound source signal corresponding to the N-th audio signal.
- the monaural signal encoded data obtained by CELP encoding of the monaural signal and encode the differential component (correction component) for the monaural signal encoded data.
- the difference value from the adaptive sound source lag obtained by CELP coding of monaural signal, the relative ratio to the adaptive sound source gain 'fixed sound source gain, etc. Hesitate.
- the coding efficiency for the CELP sound source of each channel can be improved.
- enhancement layer encoding section 120 of speech encoding apparatus 600 may be only the configuration related to lch as in Embodiment 2 (Fig. 7). That is, enhancement layer coding section 120 performs prediction of the driving sound source signal using the monaural driving sound signal only for the l-th audio signal and CELP coding for the prediction residual component.
- enhancement layer decoding section 320 of speech decoding apparatus 700 FIG. 11
- Embodiment 2 FIG.
- lch and 2ch ch CELP encoding sections 132 and 133 and lch and 2ch ch CELP decoding The units 342 and 343 may use only one of the adaptive sound source and the fixed sound source as the sound source structure in the sound source search.
- the monaural signal s_mono (n) generated by the monaural signal generation unit 111 is used as the monaural driving sound source signal instead of the Lch prediction residual signal for the Nth channel audio signal. Alternatively, it may be used to calculate the Nth channel prediction filter parameter.
- FIG. 15 shows the configuration of speech coding apparatus 750 in this case
- FIG. 16 shows the configuration of first chCELP coding section 141 and second chCELP coding section 142.
- the monaural signal s_mono (n) generated by the monaural signal generation unit 111 is input to the first chCELP encoding unit 141 and the second chCELP encoding unit 142.
- the Nch prediction signal analysis unit 403 of the lchch CELP coding unit 141 and the 2chch CELP coding unit 142 shown in FIG. 16 uses the Nch speech signal and the monaural signal s_mono (n) to perform the Nch prediction. Find the filter parameters.
- the processing for calculating the LPC prediction residual signal for the Nth channel speech signal power using the Nth channel quantization LPC parameter becomes unnecessary.
- the monaural signal s_mon 0 (n) instead of the monaural driving sound source signal
- the Nth prediction filter parameter can be obtained using a signal later in time (future) than when the monaural driving sound source signal is used. Can do.
- the N-th channel prediction filter analysis unit 403 uses the monaural signal s_mono (n) generated by the monaural signal generation unit 111 instead of the monaural signal CELP encoding unit 114 to obtain the monaural decoded signal obtained by the code ⁇ . You may make it use.
- the decoding side Nch adaptive codebook must have the same configuration.
- the residual component excitation signal for the prediction drive excitation signal of each channel performed by the lch and 2ch CELP encoding units 132 and 133 excitation search in the time domain by CELP encoding is performed.
- the residual component excitation signal may be converted to the frequency domain, and the residual component excitation signal may be encoded in the frequency domain.
- CELP coding suitable for speech coding is used, so that more efficient coding can be performed.
- FIG. 17 shows the configuration of speech encoding apparatus 800 according to the present embodiment.
- Speech encoding device
- core layer encoding section 110 includes a core layer code key unit 110 and an enhancement layer code key unit 120.
- the configuration of core layer encoding section 110 is the same as that of Embodiment 1 (FIG. 1), and thus the description thereof is omitted.
- Enhancement layer coding section 120 includes monaural signal LPC analysis section 134, monaural LPC residual signal generation section 135, first IchCELP coding section 136, and second chCELP coding section 137.
- Monaural signal LPC analysis unit 134 calculates an LPC parameter for the monaural decoded signal, and converts the monaural signal LPC parameter to monaural LPC residual signal generation unit 135, 1st ch CELP coding unit 136, and 2nd ch CELP coding. Output to part 137.
- the monaural LPC residual signal generation unit 135 generates an LPC residual signal (monaural LPC residual signal) for the monaural decoded signal using the LPC parameters, and outputs the Ich CELP code unit 136 and the second ch CELP code. Output to the conversion unit 137.
- the Ich CELP coding unit 136 and the second ch CELP coding unit 137 perform CELP coding on the audio signal of each channel using the LPC parameter and the LPC residual signal for the monaural decoded signal, and Output encoded data.
- FIG. 18 The configuration of the IchCELP code section 136 and the 2nd CELP code section 137 is shown in FIG.
- FIG. 18 the same components as those in Embodiment 3 (FIG. 10) are denoted by the same reference numerals, and description thereof is omitted.
- the ⁇ 111 ⁇ ⁇ analysis unit 413 performs LPC analysis on the Nth channel speech signal, quantizes the obtained LPC parameters, and outputs them to the Nth channel LPC prediction residual signal generation unit 402 and the synthesis finalizer 409. In addition, the Nth LPC quantized code is output.
- the NchLPC analysis unit 413 has a large correlation between the LPC parameter for the monaural signal and the LPC parameter (the NchLPC parameter) obtained from the Nth audio signal when quantizing the LPC parameter. Using this, efficient quantization is performed by quantizing the difference component of the NchLPC parameter with respect to the monaural signal LPC parameter.
- N-th channel prediction filter analysis unit 414 uses the LPC prediction residual signal output from N-th channel LPC prediction residual signal generation unit 402 and the monaural LPC residual signal output from monaural LPC residual signal generation unit 135.
- the Nth channel prediction filter parameter is obtained and quantized, the Nth channel prediction filter quantization parameter is output to the Nth channel driving excitation signal synthesizer 415, and the Nth channel prediction filter quantization code is output.
- N-th channel excitation signal synthesizer 415 uses the monaural LPC residual signal and the N-th channel prediction filter quantization parameter to synthesize a prediction-stimulation source signal corresponding to the N-th channel audio signal to generate a multiplier 407. — Output to 1.
- the speech decoding apparatus for speech coding apparatus 800 calculates the LPC parameter and the LPC residual signal for the monaural decoded signal in the same manner as speech coding apparatus 800, and the CELP decoding section of each channel Used to synthesize driving sound source signals for each channel.
- the LPC prediction residual signal output from the Nth channel LPC prediction residual signal generation unit 402 and the monaural LPC residual output from the monaural LPC residual signal generation unit 135 The Nth channel prediction filter parameter may be obtained using the Nth channel audio signal and the monaural signal s_mono (n) generated by the monaural signal generation unit 111 instead of the signal. Furthermore, instead of using the monaural signal s_mono (n) generated by the monaural signal generation unit 111, a monaural decoded signal may be used.
- the monaural signal LPC analysis section 134 and monaural LPC residual signal generation section 135 are provided, the monaural signal is encoded by an arbitrary encoding method in the core layer. Even in this case, the CELP code can be used in the enhancement layer.
- the speech encoding apparatus and speech decoding apparatus are mounted on a wireless communication apparatus such as a wireless communication mobile station apparatus or a wireless communication base station apparatus used in a mobile communication system. Is also possible.
- Each functional block used in the description of each of the above embodiments is typically realized as an LSI which is an integrated circuit. These may be individually arranged on one chip, or may be integrated into one chip so as to include a part or all of them.
- IC integrated circuit
- system LSI system LSI
- super LSI super LSI
- unilera LSI depending on the difference in power integration as LSI.
- the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor.
- a dedicated circuit or a general-purpose processor.
- FPGA Field Programmable Gate Array
- reconfigurable processor that reconfigures the connection and settings of circuit cells inside the LSI.
- the present invention can be applied to the use of a communication device in a mobile communication system or a packet communication system using the Internet protocol.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/722,737 US7945447B2 (en) | 2004-12-27 | 2005-12-26 | Sound coding device and sound coding method |
BRPI0516376-5A BRPI0516376A (pt) | 2004-12-27 | 2005-12-26 | dispositivo de codificação de som e método de codificação de som |
JP2006550764A JP5046652B2 (ja) | 2004-12-27 | 2005-12-26 | 音声符号化装置および音声符号化方法 |
EP05820404A EP1818911B1 (en) | 2004-12-27 | 2005-12-26 | Sound coding device and sound coding method |
AT05820404T ATE545131T1 (de) | 2004-12-27 | 2005-12-26 | Tonkodierungsvorrichtung und tonkodierungsmethode |
CN2005800450695A CN101091208B (zh) | 2004-12-27 | 2005-12-26 | 语音编码装置和语音编码方法 |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-377965 | 2004-12-27 | ||
JP2004377965 | 2004-12-27 | ||
JP2005-237716 | 2005-08-18 | ||
JP2005237716 | 2005-08-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006070751A1 true WO2006070751A1 (ja) | 2006-07-06 |
Family
ID=36614868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/023802 WO2006070751A1 (ja) | 2004-12-27 | 2005-12-26 | 音声符号化装置および音声符号化方法 |
Country Status (8)
Country | Link |
---|---|
US (1) | US7945447B2 (ja) |
EP (1) | EP1818911B1 (ja) |
JP (1) | JP5046652B2 (ja) |
KR (1) | KR20070092240A (ja) |
CN (1) | CN101091208B (ja) |
AT (1) | ATE545131T1 (ja) |
BR (1) | BRPI0516376A (ja) |
WO (1) | WO2006070751A1 (ja) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008016098A1 (fr) * | 2006-08-04 | 2008-02-07 | Panasonic Corporation | dispositif de codage audio stéréo, dispositif de décodage audio stéréo et procédé de ceux-ci |
JP2010540985A (ja) * | 2007-09-19 | 2010-12-24 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | マルチチャネル・オーディオのジョイント強化 |
US8150702B2 (en) | 2006-08-04 | 2012-04-03 | Panasonic Corporation | Stereo audio encoding device, stereo audio decoding device, and method thereof |
KR101274802B1 (ko) | 2008-12-29 | 2013-06-13 | 모토로라 모빌리티 엘엘씨 | 오디오 신호를 인코딩하기 위한 장치 및 방법 |
KR101274827B1 (ko) | 2008-12-29 | 2013-06-13 | 모토로라 모빌리티 엘엘씨 | 다수 채널 오디오 신호를 디코딩하기 위한 장치 및 방법, 및 다수 채널 오디오 신호를 코딩하기 위한 방법 |
KR101275892B1 (ko) | 2008-12-29 | 2013-06-17 | 모토로라 모빌리티 엘엘씨 | 오디오 신호를 인코딩하고 디코딩하기 위한 방법 및 장치 |
US9330671B2 (en) | 2008-10-10 | 2016-05-03 | Telefonaktiebolaget L M Ericsson (Publ) | Energy conservative multi-channel audio coding |
JP2018511825A (ja) * | 2015-03-09 | 2018-04-26 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | マルチチャンネル信号を符号化するためのオーディオエンコーダおよび符号化されたオーディオ信号を復号化するためのオーディオデコーダ |
WO2020250470A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
WO2020250370A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
WO2020250472A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号符号化送信方法、音信号復号方法、音信号符号化方法、音信号受信側装置、音信号送信側装置、復号装置、符号化装置、プログラム及び記録媒体 |
WO2022097242A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097238A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
WO2022097240A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097241A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097243A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097237A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
WO2022097239A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
WO2022097244A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2023032065A1 (ja) | 2021-09-01 | 2023-03-09 | 日本電信電話株式会社 | 音信号ダウンミックス方法、音信号符号化方法、音信号ダウンミックス装置、音信号符号化装置、プログラム |
US12100403B2 (en) | 2020-03-09 | 2024-09-24 | Nippon Telegraph And Telephone Corporation | Sound signal downmixing method, sound signal coding method, sound signal downmixing apparatus, sound signal coding apparatus, program and recording medium |
US12136427B2 (en) | 2020-03-09 | 2024-11-05 | Nippon Telegraph And Telephone Corporation | Sound signal downmixing method, sound signal coding method, sound signal downmixing apparatus, sound signal coding apparatus, program and recording medium |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4939933B2 (ja) * | 2004-05-19 | 2012-05-30 | パナソニック株式会社 | オーディオ信号符号化装置及びオーディオ信号復号化装置 |
US8036390B2 (en) * | 2005-02-01 | 2011-10-11 | Panasonic Corporation | Scalable encoding device and scalable encoding method |
CN1889172A (zh) * | 2005-06-28 | 2007-01-03 | 松下电器产业株式会社 | 可增加和修正声音类别的声音分类系统及方法 |
JPWO2007037359A1 (ja) * | 2005-09-30 | 2009-04-16 | パナソニック株式会社 | 音声符号化装置および音声符号化方法 |
WO2007052612A1 (ja) * | 2005-10-31 | 2007-05-10 | Matsushita Electric Industrial Co., Ltd. | ステレオ符号化装置およびステレオ信号予測方法 |
WO2007105586A1 (ja) | 2006-03-10 | 2007-09-20 | Matsushita Electric Industrial Co., Ltd. | 符号化装置および符号化方法 |
JP5190363B2 (ja) | 2006-07-12 | 2013-04-24 | パナソニック株式会社 | 音声復号装置、音声符号化装置、および消失フレーム補償方法 |
US7461106B2 (en) | 2006-09-12 | 2008-12-02 | Motorola, Inc. | Apparatus and method for low complexity combinatorial coding of signals |
FR2911031B1 (fr) * | 2006-12-28 | 2009-04-10 | Actimagine Soc Par Actions Sim | Procede et dispositif de codage audio |
FR2911020B1 (fr) * | 2006-12-28 | 2009-05-01 | Actimagine Soc Par Actions Sim | Procede et dispositif de codage audio |
US20100241434A1 (en) * | 2007-02-20 | 2010-09-23 | Kojiro Ono | Multi-channel decoding device, multi-channel decoding method, program, and semiconductor integrated circuit |
US8576096B2 (en) | 2007-10-11 | 2013-11-05 | Motorola Mobility Llc | Apparatus and method for low complexity combinatorial coding of signals |
US8209190B2 (en) * | 2007-10-25 | 2012-06-26 | Motorola Mobility, Inc. | Method and apparatus for generating an enhancement layer within an audio coding system |
US7889103B2 (en) | 2008-03-13 | 2011-02-15 | Motorola Mobility, Inc. | Method and apparatus for low complexity combinatorial coding of signals |
US8639519B2 (en) | 2008-04-09 | 2014-01-28 | Motorola Mobility Llc | Method and apparatus for selective signal coding based on core encoder performance |
KR101428487B1 (ko) * | 2008-07-11 | 2014-08-08 | 삼성전자주식회사 | 멀티 채널 부호화 및 복호화 방법 및 장치 |
CN101635145B (zh) * | 2008-07-24 | 2012-06-06 | 华为技术有限公司 | 编解码方法、装置和系统 |
US8175888B2 (en) | 2008-12-29 | 2012-05-08 | Motorola Mobility, Inc. | Enhanced layered gain factor balancing within a multiple-channel audio coding system |
CN102804262A (zh) * | 2009-06-05 | 2012-11-28 | 皇家飞利浦电子股份有限公司 | 音频信号的上混合 |
US8423355B2 (en) | 2010-03-05 | 2013-04-16 | Motorola Mobility Llc | Encoder for audio signal including generic audio and speech frames |
US9514757B2 (en) | 2010-11-17 | 2016-12-06 | Panasonic Intellectual Property Corporation Of America | Stereo signal encoding device, stereo signal decoding device, stereo signal encoding method, and stereo signal decoding method |
US9129600B2 (en) | 2012-09-26 | 2015-09-08 | Google Technology Holdings LLC | Method and apparatus for encoding an audio signal |
EP2919232A1 (en) | 2014-03-14 | 2015-09-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder, decoder and method for encoding and decoding |
CN110709925B (zh) * | 2017-04-10 | 2023-09-29 | 诺基亚技术有限公司 | 用于音频编码或解码的方法及装置 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US543948A (en) * | 1895-08-06 | Registering mechanism for cyclometers | ||
US5434948A (en) | 1989-06-15 | 1995-07-18 | British Telecommunications Public Limited Company | Polyphonic coding |
DE4320990B4 (de) * | 1993-06-05 | 2004-04-29 | Robert Bosch Gmbh | Verfahren zur Redundanzreduktion |
DE19742655C2 (de) | 1997-09-26 | 1999-08-05 | Fraunhofer Ges Forschung | Verfahren und Vorrichtung zum Codieren eines zeitdiskreten Stereosignals |
KR100335609B1 (ko) * | 1997-11-20 | 2002-10-04 | 삼성전자 주식회사 | 비트율조절이가능한오디오부호화/복호화방법및장치 |
US6446037B1 (en) * | 1999-08-09 | 2002-09-03 | Dolby Laboratories Licensing Corporation | Scalable coding method for high quality audio |
SE519985C2 (sv) * | 2000-09-15 | 2003-05-06 | Ericsson Telefon Ab L M | Kodning och avkodning av signaler från flera kanaler |
DE10102159C2 (de) * | 2001-01-18 | 2002-12-12 | Fraunhofer Ges Forschung | Verfahren und Vorrichtung zum Erzeugen bzw. Decodieren eines skalierbaren Datenstroms unter Berücksichtigung einer Bitsparkasse, Codierer und skalierbarer Codierer |
SE0202159D0 (sv) * | 2001-07-10 | 2002-07-09 | Coding Technologies Sweden Ab | Efficientand scalable parametric stereo coding for low bitrate applications |
ES2268340T3 (es) * | 2002-04-22 | 2007-03-16 | Koninklijke Philips Electronics N.V. | Representacion de audio parametrico de multiples canales. |
WO2004072956A1 (en) * | 2003-02-11 | 2004-08-26 | Koninklijke Philips Electronics N.V. | Audio coding |
US7725324B2 (en) * | 2003-12-19 | 2010-05-25 | Telefonaktiebolaget Lm Ericsson (Publ) | Constrained filter encoding of polyphonic signals |
DE602005016130D1 (de) | 2004-09-30 | 2009-10-01 | Panasonic Corp | Einrichtung für skalierbare codierung, einrichtung für skalierbare decodierung und verfahren dafür |
SE0402650D0 (sv) * | 2004-11-02 | 2004-11-02 | Coding Tech Ab | Improved parametric stereo compatible coding of spatial audio |
-
2005
- 2005-12-26 US US11/722,737 patent/US7945447B2/en active Active
- 2005-12-26 AT AT05820404T patent/ATE545131T1/de active
- 2005-12-26 WO PCT/JP2005/023802 patent/WO2006070751A1/ja active Application Filing
- 2005-12-26 CN CN2005800450695A patent/CN101091208B/zh not_active Expired - Fee Related
- 2005-12-26 KR KR1020077014562A patent/KR20070092240A/ko not_active Application Discontinuation
- 2005-12-26 BR BRPI0516376-5A patent/BRPI0516376A/pt not_active Application Discontinuation
- 2005-12-26 JP JP2006550764A patent/JP5046652B2/ja not_active Expired - Fee Related
- 2005-12-26 EP EP05820404A patent/EP1818911B1/en not_active Not-in-force
Non-Patent Citations (5)
Title |
---|
BAUMGARTE F. AND FALLER C.: "Binaural Cue Coding-Part I: Psychoacoustic Fundamentals and Design Principles", IEEE TRANS. ON SPEEC AND AUDIO PROCESSING, vol. 11, no. 6, 2003, pages 509 - 519, XP002996341 * |
GOTO M. ET AL.: "Onsei Tsushin'yo Stereo Onsei Fugoka Hoho no Kento.(A Study of Stereo Speech Coding Methods for Speech Communications.)", 2004 NEN THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS ENGINEERING SCIENCES SOCIETY TAIKAI KOEN RONBUNSHU, vol. A-6-6, 8 September 2004 (2004-09-08), pages 119, XP002996344 * |
KAMAMOTO Y. ET AL.: "Channel-Kan Sokan o Mochiita Ta-Channel Shingo no Kagyaku Asshuku Fugoka.(Lossless Compression of Multi-Channel Signals Using Inter-Channel Correlation.)", FIT2004 (DAI 3 KAI FORUM ON INFORMATION TECHNOLOGY) KOEN RONBUNSHU, vol. M-016, 20 August 2004 (2004-08-20), pages 123 - 124, XP002996343 * |
KATAOKA A. ET AL.: "G.729 o Kosei Yoso Toshite Mochiiru Scalable Kotaiiki Onsei Fugoka.(Scalable Wideband Speech Coding Using G.729 as a Component.)", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, D-II, vol. J86-D-II, no. 3, 1 March 2003 (2003-03-01), pages 379 - 387, XP002996342 * |
YOSHIDA K. ET AL.: "Scalable Stereo Onsei Fugoka no channel-Kan Yosoku ni Kansuru Yobi Kento.(A Preliminary Study of Inter-Channel Prediction for Scalable Stereo Speech Coding.)", 2005 NEN THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS SOGO TAIKAI KOEN RONBUNSHU, vol. D-14-1, 7 March 2005 (2005-03-07), pages 118, XP002996345 * |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8150702B2 (en) | 2006-08-04 | 2012-04-03 | Panasonic Corporation | Stereo audio encoding device, stereo audio decoding device, and method thereof |
WO2008016098A1 (fr) * | 2006-08-04 | 2008-02-07 | Panasonic Corporation | dispositif de codage audio stéréo, dispositif de décodage audio stéréo et procédé de ceux-ci |
JP2010540985A (ja) * | 2007-09-19 | 2010-12-24 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | マルチチャネル・オーディオのジョイント強化 |
US9330671B2 (en) | 2008-10-10 | 2016-05-03 | Telefonaktiebolaget L M Ericsson (Publ) | Energy conservative multi-channel audio coding |
KR101274827B1 (ko) | 2008-12-29 | 2013-06-13 | 모토로라 모빌리티 엘엘씨 | 다수 채널 오디오 신호를 디코딩하기 위한 장치 및 방법, 및 다수 채널 오디오 신호를 코딩하기 위한 방법 |
KR101275892B1 (ko) | 2008-12-29 | 2013-06-17 | 모토로라 모빌리티 엘엘씨 | 오디오 신호를 인코딩하고 디코딩하기 위한 방법 및 장치 |
KR101274802B1 (ko) | 2008-12-29 | 2013-06-13 | 모토로라 모빌리티 엘엘씨 | 오디오 신호를 인코딩하기 위한 장치 및 방법 |
US11107483B2 (en) | 2015-03-09 | 2021-08-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
JP2018511825A (ja) * | 2015-03-09 | 2018-04-26 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | マルチチャンネル信号を符号化するためのオーディオエンコーダおよび符号化されたオーディオ信号を復号化するためのオーディオデコーダ |
US10388287B2 (en) | 2015-03-09 | 2019-08-20 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US10395661B2 (en) | 2015-03-09 | 2019-08-27 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US10777208B2 (en) | 2015-03-09 | 2020-09-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US11881225B2 (en) | 2015-03-09 | 2024-01-23 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US11741973B2 (en) | 2015-03-09 | 2023-08-29 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
JP2023029849A (ja) * | 2015-03-09 | 2023-03-07 | フラウンホッファー-ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | マルチチャンネル信号を符号化するためのオーディオエンコーダおよび符号化されたオーディオ信号を復号化するためのオーディオデコーダ |
US11238874B2 (en) | 2015-03-09 | 2022-02-01 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
JPWO2020250470A1 (ja) * | 2019-06-13 | 2020-12-17 | ||
WO2020250470A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
WO2020250471A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
WO2020250371A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号符号化送信方法、音信号符号化方法、音信号送信側装置、符号化装置、プログラム及び記録媒体 |
WO2020250472A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号符号化送信方法、音信号復号方法、音信号符号化方法、音信号受信側装置、音信号送信側装置、復号装置、符号化装置、プログラム及び記録媒体 |
WO2020250369A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
JP7192986B2 (ja) | 2019-06-13 | 2022-12-20 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
JP7205626B2 (ja) | 2019-06-13 | 2023-01-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号符号化送信方法、音信号復号方法、音信号符号化方法、音信号受信側装置、音信号送信側装置、復号装置、符号化装置、プログラム及び記録媒体 |
JPWO2020250472A1 (ja) * | 2019-06-13 | 2020-12-17 | ||
JPWO2020250471A1 (ja) * | 2019-06-13 | 2020-12-17 | ||
JP7192987B2 (ja) | 2019-06-13 | 2022-12-20 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
WO2020250370A1 (ja) * | 2019-06-13 | 2020-12-17 | 日本電信電話株式会社 | 音信号受信復号方法、音信号復号方法、音信号受信側装置、復号装置、プログラム及び記録媒体 |
US12100403B2 (en) | 2020-03-09 | 2024-09-24 | Nippon Telegraph And Telephone Corporation | Sound signal downmixing method, sound signal coding method, sound signal downmixing apparatus, sound signal coding apparatus, program and recording medium |
US12119009B2 (en) | 2020-03-09 | 2024-10-15 | Nippon Telegraph And Telephone Corporation | Sound signal downmixing method, sound signal coding method, sound signal downmixing apparatus, sound signal coding apparatus, program and recording medium |
US12136427B2 (en) | 2020-03-09 | 2024-11-05 | Nippon Telegraph And Telephone Corporation | Sound signal downmixing method, sound signal coding method, sound signal downmixing apparatus, sound signal coding apparatus, program and recording medium |
WO2022097240A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
JP7517461B2 (ja) | 2020-11-05 | 2024-07-17 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097239A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
WO2022097242A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097237A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
WO2022097243A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
JP7491393B2 (ja) | 2020-11-05 | 2024-05-28 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
JP7491394B2 (ja) | 2020-11-05 | 2024-05-28 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
JP7491395B2 (ja) | 2020-11-05 | 2024-05-28 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
WO2022097244A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
JP7517458B2 (ja) | 2020-11-05 | 2024-07-17 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
JP7517460B2 (ja) | 2020-11-05 | 2024-07-17 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
JP7517459B2 (ja) | 2020-11-05 | 2024-07-17 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
JP7544139B2 (ja) | 2020-11-05 | 2024-09-03 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097241A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体 |
WO2022097238A1 (ja) * | 2020-11-05 | 2022-05-12 | 日本電信電話株式会社 | 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体 |
WO2023032065A1 (ja) | 2021-09-01 | 2023-03-09 | 日本電信電話株式会社 | 音信号ダウンミックス方法、音信号符号化方法、音信号ダウンミックス装置、音信号符号化装置、プログラム |
Also Published As
Publication number | Publication date |
---|---|
US20080010072A1 (en) | 2008-01-10 |
CN101091208B (zh) | 2011-07-13 |
EP1818911A4 (en) | 2008-03-19 |
CN101091208A (zh) | 2007-12-19 |
EP1818911B1 (en) | 2012-02-08 |
KR20070092240A (ko) | 2007-09-12 |
JPWO2006070751A1 (ja) | 2008-06-12 |
BRPI0516376A (pt) | 2008-09-02 |
ATE545131T1 (de) | 2012-02-15 |
EP1818911A1 (en) | 2007-08-15 |
JP5046652B2 (ja) | 2012-10-10 |
US7945447B2 (en) | 2011-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5046652B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP4850827B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP5046653B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP4907522B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP5413839B2 (ja) | 符号化装置および復号装置 | |
JP4555299B2 (ja) | スケーラブル符号化装置およびスケーラブル符号化方法 | |
US7904292B2 (en) | Scalable encoding device, scalable decoding device, and method thereof | |
CN101023470A (zh) | 语音编码装置、语音解码装置、通信装置及语音编码方法 | |
US8271275B2 (en) | Scalable encoding device, and scalable encoding method | |
JP4937746B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP2006072269A (ja) | 音声符号化装置、通信端末装置、基地局装置および音声符号化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006550764 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11722737 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005820404 Country of ref document: EP Ref document number: 1020077014562 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580045069.5 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005820404 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11722737 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: PI0516376 Country of ref document: BR |