WO2008016097A1 - Stereo audio encoding device, stereo audio decoding device, and method thereof - Google Patents

Stereo audio encoding device, stereo audio decoding device, and method thereof Download PDF

Info

Publication number
WO2008016097A1
WO2008016097A1 PCT/JP2007/065132 JP2007065132W WO2008016097A1 WO 2008016097 A1 WO2008016097 A1 WO 2008016097A1 JP 2007065132 W JP2007065132 W JP 2007065132W WO 2008016097 A1 WO2008016097 A1 WO 2008016097A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
channel
reconstructed
cross
monaural
Prior art date
Application number
PCT/JP2007/065132
Other languages
French (fr)
Japanese (ja)
Inventor
Jiong Zhou
Kok Seng Chong
Original Assignee
Panasonic Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Corporation filed Critical Panasonic Corporation
Priority to US12/376,000 priority Critical patent/US8150702B2/en
Priority to EP07791812.6A priority patent/EP2048658B1/en
Priority to JP2008527782A priority patent/JP4999846B2/en
Publication of WO2008016097A1 publication Critical patent/WO2008016097A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form

Definitions

  • stereo speech coding apparatus stereo speech decoding apparatus, and methods thereof
  • the present invention relates to a stereo speech coding apparatus used for encoding / decoding a stereo speech signal in a mobile communication system or a packet communication system using an Internet protocol (IP),
  • IP Internet protocol
  • the present invention relates to a stereo speech decoding apparatus and a method thereof.
  • binaural cue coding As a technique for encoding spatial information included in a stereo audio signal, binaural cue coding (BCC) can be cited.
  • the encoding side encodes the monaural signal generated by combining the signals of the multiple channels that make up the stereo audio signal, and queues between the channel signals (inter-channel cues). ) Is calculated and encoded.
  • Inter-channel cues are sub-information used to predict channel signals from monaural signals.
  • FIG. 1 is a block diagram showing the main configuration of stereo audio encoding apparatus 10 disclosed in Non-Patent Document 1. In FIG.
  • a monaural signal generation unit 11 generates a monaural signal (M) using an L channel signal and an R channel signal that constitute an input stereo audio signal, and outputs the monaural signal (M) to the monaural signal encoding unit 12.
  • the monaural signal encoding unit 12 encodes the monaural signal generated by the monaural signal generation unit 11 to generate a monaural signal encoding parameter, and outputs it to the multiplexing unit 14.
  • the inter-channel queue calculation unit 13 calculates an inter-channel queue including ILD, ITD, ICC, and the like of the input L channel signal and R channel signal, and outputs them to the multiplexing unit 14.
  • the multiplexing unit 14 multiplexes the monaural signal encoding parameter input from the monaural signal encoding unit 12 and the inter-channel queue input from the inter-channel queue calculation unit 13, and the obtained bit stream is a stereo audio decoding device. Send to 20.
  • FIG. 2 is a block diagram showing the main configuration of stereo audio decoding apparatus 20 disclosed in Non-Patent Document 1.
  • the separation unit 21 performs separation processing on the bitstream transmitted from the stereo audio encoding device 10, outputs the obtained monaural signal coding parameters to the monaural signal decoding unit 22, and obtains the obtained channel.
  • the inter-queue is output to the first queue combining unit 24 and the second queue combining unit 25.
  • the monaural signal decoding unit 22 performs a decoding process using the monaural signal encoding parameters input from the separation unit 21, and converts the obtained monaural decoded signal into an all-pass filter 23, a first queue synthesis unit 24, and a second queue synthesis. Output to part 25.
  • the all-pass filter 23 delays the input monaural decoded signal for a predetermined time from the monaural signal decoding unit 22 and outputs the generated monaural reverberation signal (M ′) to the first cue synthesizing unit 24 and the second cue synthesizing unit 25. Output to. 1st queue
  • the synthesizing unit 24 performs a decoding process using the inter-channel queue input from the demultiplexing unit 21, the monaural decoded signal input from the monaural signal decoding unit 22, and the monaural reverberation signal input from the all-pass filter 23.
  • the obtained L channel decoded signal (L ') is output.
  • the second cue synthesis unit 25 receives the inter-channel queue input from the separation unit 21, the monaural decoded signal input from the monaural signal decoding unit 22, and the all-pass filter 23. Decoding is performed using the monaural reverberation signal, and the resulting R channel decoded signal (R ′) is output.
  • the conventional mobile phone can already be equipped with a multimedia player having a stereo function and an FM radio function. Furthermore, it is expected that functions such as recording and playback of stereo audio signals will be added to 4th generation mobile phones and IP phones.
  • Non-patent literature l ISO / IEC 14496-3: 2005 Part3 Audio, 8.6.4 Parametric stereo
  • Non-Patent Document 2 ISO / IEC 23003-1: 2006 / FCD MPEG Surround (ISO / IEC 23003-1: 20
  • the stereo speech coding apparatus includes a first calculation means for calculating a first cross-correlation coefficient between a first channel signal and a second channel signal constituting stereo speech, and the first channel signal.
  • Stereo audio reconstructing means for generating a first channel reconstructed signal and a second channel reconstructed signal using the second channel signal, and the first channel reconstructed signal and the second channel reconstructed signal
  • second calculation means for calculating a second cross-correlation coefficient
  • Comparing means for obtaining a cross-correlation comparison result including spatial information of Leo speech is adopted.
  • the stereo speech decoding apparatus of the present invention includes a first parameter and a second channel signal, which are generated by the encoding device from the received bit stream and each of the first channel signal and the second channel signal constituting the stereo sound.
  • Two parameters a first cross-correlation between the first channel signal and the second channel signal, a first channel reconstructed signal and a second channel generated using the first channel signal and the second channel signal.
  • Stereo audio decoding means for generating a first channel reconstructed decoded signal and a second channel reconstructed decoded signal using the first channel reconstructed decoded signal
  • a stereo reverberation signal generating means for generating a second channel reverberation signal using the second channel reconstructed decoded signal, the first channel reconstructed decoded signal, and Using the first channel reverberation signal and the cross-correlation comparison result, first spatial information reproduction means for generating a first channel decoded signal, the second channel reconstructed decoded signal, and the second channel reverberation signal
  • the second spatial information reproducing means for generating a second channel decoded signal using the cross-correlation comparison result is employed.
  • two cross-correlation coefficients are compared as spatial information related to inter-channel cross-correlation (ICC), and the comparison result is transmitted to the stereo decoding side.
  • ICC inter-channel cross-correlation
  • the spatial image of the decoded stereo audio signal can be improved.
  • FIG. 1 is a block diagram showing the main configuration of a stereo audio encoding device according to the prior art.
  • FIG. 2 is a block diagram showing the main configuration of a stereo audio decoding device according to the prior art.
  • FIG. 3 is a block diagram showing the main configuration of the stereo speech coding apparatus according to Embodiment 1 of the present invention.
  • FIG. 4 is a block diagram showing a main configuration inside a stereo speech reconstruction unit according to Embodiment 1 of the present invention.
  • FIG. 5 is a diagram for illustrating the configuration and operation of an adaptive filter according to Embodiment 1 of the present invention.
  • FIG. 6 is a procedure of stereo speech coding processing in the stereo speech coding apparatus according to Embodiment 1 of the present invention. Flow diagram showing an example
  • FIG. 7 is a block diagram showing the main configuration of the stereo speech decoding apparatus according to Embodiment 1 of the present invention.
  • FIG. 8 is a block diagram showing a main configuration inside a stereo speech decoding unit according to Embodiment 1 of the present invention.
  • FIG. 9 is a flowchart showing an example of a procedure of stereo audio decoding processing in the stereo audio decoding device according to Embodiment 1 of the present invention.
  • FIG. 10 is a block diagram showing the main configuration of a stereo speech decoding apparatus according to Embodiment 2 of the present invention.
  • a stereo audio signal is composed of a left (U channel and right (R) channel
  • the stereo audio encoding device is input.
  • the cross-correlation coefficient C between the original L-channel signal and the R-channel signal is calculated, and the stereo speech coding apparatus according to each embodiment includes a local stereo speech reconstructing unit.
  • the channel signal is reconstructed, and the cross-correlation coefficient C between the reconstructed L channel signal and R channel signal is calculated.
  • the stereo speech coding apparatus compares the cross-correlation coefficient C with the cross-correlation coefficient C.
  • FIG. 3 is a block diagram showing the main configuration of stereo speech coding apparatus 100 according to Embodiment 1 of the present invention.
  • the stereo speech coding apparatus 100 performs stereo speech coding processing using the input L-channel signal and R-channel signal of the stereo signal.
  • the transmitted bit stream is transmitted to a stereo audio decoding device 200 described later.
  • the stereo speech decoding apparatus 200 corresponding to the stereo speech coding apparatus 100 outputs a decoded signal of either a monaural signal or a stereo signal, thereby realizing monaural / stereo scalable coding.
  • the original cross-correlation calculation unit 101 calculates a cross-correlation coefficient C between the original L channel signal (L) and the R channel signal (R) constituting the stereo audio signal according to the following equation (1), and the cross correlation coefficient C is calculated. Output to correlation comparison section 106.
  • n Sample number on the time axis
  • the monaural signal generation unit 102 uses the L channel signal (L) and the R channel signal (R), for example, according to the following equation (2), M) is generated, and the generated monaural signal (M) is output to the monaural signal encoding unit 103 and stereo audio reconstruction unit 104.
  • M (n) ⁇ [L (n) + R (n)] ⁇ (2) where n is the sample number on the time axis
  • the monaural signal encoding unit 103 performs an audio encoding process such as AMR—WB (Adaptive MultiRate-WideBand) on the monaural signal input from the monaural signal generation unit 102, and obtains it.
  • Stereo audio reconstructing section 104 encodes L channel signal (L) and R channel signal (R) using monaural signal (M) input from monaural signal generating section 102.
  • the obtained L channel adaptive filter parameters and R channel adaptive filter parameters are output to multiplexing section 107.
  • Stereo audio reconstructing section 104 performs decoding processing using the obtained L channel adaptive filter parameters, R channel adaptive filter parameters, and monaural signal encoding parameters input from monaural signal encoding section 103, and The obtained L channel reconstructed signal (L ′) and R channel reconstructed signal (R ′) are output to the reconstructed cross-correlation calculating unit 105. Details of the stereo audio reconstruction unit 104 will be described later.
  • the reconstructed cross-correlation calculating unit 105 performs a cross-correlation coefficient C between the L channel reconstructed signal (L ') input from the stereo speech reconstructing unit 104 and the R channel reconstructed signal (R').
  • n is the sample number on the time axis
  • R '(n) R channel reconstruction signal
  • the cross-correlation comparison unit 106 uses the cross-correlation coefficient C input from the original cross-correlation calculation unit 101 and the cross-correlation coefficient C input from the reconstructed cross-correlation calculation unit 105 as follows:
  • the cross-correlation value C between the reconstructed stereo signals is usually the original stereo signal.
  • the cross-correlation value between is greater than C. In such cases, C is greater than C
  • Multiplexer 107 includes a monaural signal encoding parameter input from monaural signal encoding unit 103, an L channel adaptive filter parameter, an R channel adaptive filter parameter, and a cross-correlation input from stereo audio reconstruction unit 104.
  • the cross correlation comparison result ⁇ input from the comparison unit 106 is multiplexed, and the obtained bit stream is transmitted to the stereo speech decoding apparatus 200.
  • FIG. 4 is a block diagram showing a main configuration inside stereo audio reconstructing section 104.
  • the L channel adaptive filter 141 includes an adaptive filter, and uses the L channel signal (U and the monaural signal ( ⁇ ) input from the monaural signal generation unit 102 as a reference signal and an input signal, respectively.
  • An adaptive filter parameter that minimizes the mean square error between the signal and the input signal is obtained and output to the L channel synthesis filter 144 and the multiplexing unit 107.
  • the adaptive filter parameter obtained by the L channel adaptive filter 141 is obtained. Is called the L channel adaptive filter parameter.
  • the L channel synthesis filter 144 performs decoding on the monaural decoded signal (') input from the monaural signal decoding unit 143 using the L channel adaptive filter parameter input from the L channel adaptive filter 141. Processing is performed, and the obtained L channel reconstructed signal (L ′) is output to the reconstructed cross correlation calculating unit 105.
  • the R channel synthesis filter 145 filters the monaural decoded signal ( ⁇ ') input from the monaural signal decoding unit 143 using the R channel adaptive filter parameter input from the R channel adaptive filter 142. Processing is performed, and the obtained R channel reconstructed signal (R ′) is output to the reconstructed cross correlation calculating unit 105.
  • FIG. 5 is a diagram for explaining the configuration and operation of the adaptive filter that constitutes the L-channel adaptive filter 141.
  • indicates a sample number on the time axis.
  • FIR Finite Impulse Response
  • X (n) represents an input signal of the adaptive filter.
  • the monaural signal (M) input from the monaural signal generation unit 102 is used.
  • Y (n) represents the reference signal of the adaptive filter.
  • the L channel signal (L) is used.
  • (n) represents the prediction error
  • k represents the filter order
  • the adaptive filter constituting the R channel adaptive filter 142 is an L channel adaptive filter 14. 1 is different from the filter constituting the L channel adaptive filter 141 in that the R channel signal (R) is input as the reference signal y (n).
  • FIG. 6 is a flowchart showing an example of the procedure of stereo speech coding processing in stereo speech coding apparatus 100.
  • step (hereinafter abbreviated as “ST”) 151 the original cross-correlation calculation unit
  • monaural signal encoding section 103 encodes the monaural signal to generate a monaural signal encoding parameter.
  • L channel adaptive filter 141 obtains an L channel adaptive filter parameter that minimizes the mean square error between the L channel signal and the monaural signal.
  • the R channel adaptive filter 142 obtains an R channel adaptive filter parameter that minimizes the mean square error between the R channel signal and the monaural signal.
  • monaural signal decoding section 143 performs decoding processing using the monaural signal encoding parameter, and generates a monaural decoded signal ( ⁇ ').
  • the L channel synthesis filter 144 reconstructs the L channel signal using the monaural decoded signal ( ⁇ ,) and the L channel adaptive filter parameter, and the L channel reconstructed signal (L ' ) Is generated.
  • R channel synthesis filter 145 performs monaural decoded signal ( ⁇ ,
  • the R channel signal is reconstructed to generate the R channel reconstructed signal (R ′).
  • cross-correlation comparison section 106 compares cross-correlation coefficient C with cross-correlation coefficient C, and obtains cross-correlation comparison result ⁇ .
  • stereo speech coding apparatus 100 converts the adaptive filter parameters obtained in L-channel adaptive filter 141 and R-channel adaptive filter 142 into the space related to the inter-channel level difference (ILD) and the inter-channel time difference (ITD).
  • the information parameter is transmitted to the stereo speech decoding apparatus 200.
  • Stereo speech coding apparatus 100 also performs stereo speech decoding using cross-correlation comparison result ⁇ obtained in cross-correlation comparing section 106 as a spatial information parameter regarding inter-channel cross-correlation (ICC) between the L channel signal and the R channel signal. Sent to device 200.
  • stereo speech coding apparatus 100 uses correlation coefficient C between the original L channel signal (L) and R channel signal (R) instead of cross correlation comparison result ⁇ . May be transmitted. Even in this case, the decoder can obtain the cross-correlation coefficient C between the L-channel reconstructed signal (L ') and the R-channel reconstructed signal (R').
  • A is obtained by calculating at the decoder side.
  • the stereo speech coding apparatus 100 does not need to generate L channel and R channel reconstructed signals, thereby reducing the amount of computation.
  • FIG. 7 is a block diagram showing the main configuration of stereo speech decoding apparatus 200.
  • Separating section 201 performs separation processing on the bit stream transmitted from stereo speech coding apparatus 100, and obtains the obtained monaural signal coding parameter, L channel adaptive filter parameter, and R channel adaptive filter parameter. The result is output to stereo speech decoding section 202, and cross-correlation comparison result ⁇ is output to L channel spatial information reproduction section 205 and R channel spatial information reproduction section 206.
  • Stereo speech decoding section 202 decodes the L channel signal and the R channel signal using the monaural signal encoding parameter, the L channel adaptive filter parameter, and the R channel adaptive filter parameter input from demultiplexing section 201.
  • L channel reconstruction obtained The signal (L ′) is output to the L-channel all-pass filter 203 and the L-channel spatial information reproduction unit 205.
  • Stereo audio decoding section 202 outputs the R channel reconstructed signal (R ′) obtained by decoding to R channel all-pass filter 204 and R channel spatial information reproduction section 206. Details of the stereo audio decoding unit 202 will be described later.
  • the L-channel all-pass filter 203 uses the all-pass filter parameter representing the transfer function shown in the following equation (6) and the L-channel reconstructed signal (L ') input from the stereo speech decoding unit 202. Generates L channel reverberation signal (L ') and reproduces L channel spatial information
  • ⁇ ⁇ represents the transfer function of the all-pass filter
  • a [a, a, ...
  • N indicates all-pass filter parameters
  • the R channel all-pass filter 204 uses the all-pass filter parameter representing the transfer function shown in the above equation (6) and the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202.
  • R channel reverberation signal (R ') is generated and R channel spatial information is regenerated.
  • the R channel spatial information reproduction unit 206 is input from the cross correlation comparison result a input from the separation unit 201, the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202, and the R channel all-pass filter 204.
  • R channel reverberation signal (R ') is used.
  • the R channel decoded signal (R ′′) is calculated and output according to the following equation (8).
  • Equation (11) The molecular term is given by equation (11) below.
  • the signals for the correlation calculation of the second to fourth terms on the right side of Equation (11) are almost orthogonal.
  • the second to fourth terms are much smaller than the first term and can be regarded as almost zero. Therefore, the cross-correlation value C between the L channel decoded signal (L '') and the R channel decoded signal (R ”) is obtained from the equations (4), (9), (10),
  • a two-channel decoded signal that is equal to the cross-correlation value can be obtained.
  • FIG. 8 is a block diagram showing the main configuration inside stereo audio decoding section 202.
  • the monaural signal decoding unit 221 performs decoding processing using the monaural signal encoding parameter input from the separation unit 201, and converts the obtained monaural decoded signal ( ⁇ ′) into the L channel synthesis filter 222 and R Output to channel synthesis filter 223.
  • the L channel synthesis filter 222 performs a decoding process for filtering the monaural decoded signal ( ⁇ ') input from the monaural signal decoding unit 221 with the L channel adaptive filter parameter input from the separation unit 201.
  • the obtained L channel reconstructed signal (L ′) is output to the L channel all-pass filter 203 and the L channel spatial information reproduction unit 205.
  • the R channel synthesis filter 223 performs a decoding process for filtering the monaural decoded signal ( ⁇ ′) input from the monaural signal decoding unit 221 with the R channel adaptive filter parameter input from the separation unit 201.
  • the obtained R channel reconstructed signal (R ′) is output to the R channel all-pass filter 204 and the R channel spatial information reproduction unit 206.
  • FIG. 9 is a flowchart showing an example of a procedure of stereo speech decoding processing in stereo speech decoding apparatus 200.
  • separation section 201 performs separation processing using the bitstream transmitted from stereo speech coding apparatus 100, and performs monaural signal coding parameters, L channel adaptive filter parameters, R channel adaptive filters. Parameters and cross-correlation comparison result a are generated.
  • monaural signal decoding section 221 decodes the monaural signal using the monaural signal encoding parameter to generate a monaural decoded signal ( ⁇ ′).
  • L channel synthesis filter 222 performs monaural decoded signal ( ⁇ ,) For the L channel adaptive filter parameters
  • L channel reconstructed signal (L ') is generated.
  • R channel synthesis filter 223 performs monaural decoded signal (M,
  • R ′ Is subjected to a decoding process for filtering with the R channel adaptive filter parameter to generate an R channel reconstructed signal (R ′).
  • the L-channel all-pal filter 203 generates an L-channel reverberation signal (L ') using the L-channel reconstructed signal (L').
  • the R channel all-pal filter 204 generates an R channel reverberation signal (R ') using the R channel reconstructed signal (R').
  • L channel spatial information reproduction section 205 uses L channel reconstruction signal (L ′), L channel reverberation signal (L ′), and cross correlation comparison result ⁇ to
  • a channel decoded signal (L '') is generated.
  • R channel spatial information reproduction section 206 uses R channel reconstruction signal (R '), R channel reverberation signal (R'), and cross correlation comparison result ⁇ to
  • a channel decoded signal (R '') is generated.
  • an L channel adaptive filter parameter which is a spatial information parameter regarding inter-channel level difference (ILD) and inter-channel time difference (ITD)
  • a cross correlation comparison result a which is spatial information related to inter-channel cross correlation (ICC)
  • the stereo speech decoding apparatus performs stereo speech decoding using these pieces of information, the power S can be improved by improving the spatial image of the decoded speech.
  • the L channel adaptive filter parameter and the L channel adaptive filter parameter are obtained and transmitted as spatial information parameters regarding the inter-channel level difference (ILD) and the inter-channel time difference (ITD).
  • ILD inter-channel level difference
  • ITD inter-channel time difference
  • the power described by taking the case as an example The present invention is not limited to this, and a spatial information parameter indicating inter-channel difference information other than the L channel adaptive filter parameter and the R channel adaptive filter parameter may be obtained and transmitted.
  • the cross-correlation comparison unit 106 obtains the cross-correlation comparison result according to the above equation (4) has been described as an example, but the present invention is not limited to this, Find other comparison results that uniquely represent the difference between the relationship number C and the cross-correlation C
  • the L channel reverberation signal (L ') and the R channel reverberation signal are used in the L channel allpass filter 203 and the R channel onrepath filter 204 using a fixed allpass filter parameter.
  • all-pass filter parameters transmitted from stereo speech coding apparatus 100 may be used.
  • FIG. 6 and FIG. 9 an example is shown in which processing of each step is performed serially as an example of a procedure.
  • steps that can be reordered or parallelized.
  • the L channel adaptive filter parameter is calculated in ST154 and the R channel adaptive filter parameter is calculated in ST155 as an example. The order of these two steps is changed, and the R channel adaptive filter parameter is changed in ST154.
  • the L channel adaptive filter parameters may be calculated in ST155, or the processing in ST154 and ST155 may be performed in parallel.
  • the decoding of the monaural signal performed in ST156 may be performed before ST154 or before ST155, and may be processed in parallel with ST154 or ST155.
  • ST151 may be fi at any timing from the start to ST159.
  • the monaural decoded signal ( ⁇ ′) generated by monaural signal decoding section 221 is not output to the outside of stereo audio decoding apparatus 200.
  • the present invention is not limited to this.
  • the monaural decoded signal ( ⁇ ') can be output to the outside of the stereo audio decoding device 200 and used as the decoded audio of the stereo audio decoding device 200! /.
  • stereo speech reconstruction unit of stereo speech coding apparatus 100 104 is an L channel adaptive filter obtained by encoding the monaural signal (M) input from the monaural signal generation unit 102 with respect to the L channel signal (L) and the R channel signal (R). Parameter and R channel adaptive filter parameter, and the monaural decoded signal ( ⁇ ′) obtained by performing decoding using the monaural signal encoding parameter input from the monaural signal encoding unit 103.
  • the present invention is not limited to this, and the stereo sound reconstructing unit 104 is connected to the monaural signal (M).
  • the stereo audio encoding device may not include the monaural signal generation unit 102 and the monaural signal encoding unit 103.
  • the L channel coding parameter and the R channel coding parameter are replaced by the L channel signal (L) and the R channel signal in the stereo speech reconstruction unit. It is generated by the encoding process (R). For this reason, the bit stream output from this stereo speech coding apparatus may not include a monaural signal coding parameter.
  • the stereo speech decoding apparatus 200 shown in Fig. 7 does not use monaural signal coding parameters. That is, when the monaural signal encoding parameter is not included in the bit stream, the monaural signal encoding parameter is not output from the separation unit 201. Further, the stereo speech decoding unit 202 does not include the monaural signal decoding unit 221 and performs the processing within the stereo speech reconstruction unit of the corresponding stereo speech coding apparatus for the L channel coding parameter and the R channel coding parameter.
  • the L channel reconstructed signal (L ′) and the R channel reconstructed signal (R ′) may be obtained by performing a decoding process similar to the above decoding process.
  • the decoding side generates L channel and R channel decoded signals.
  • the L channel reverberation signal (L ') and the R channel reverberation signal (R') are used.
  • the present invention is not limited to this, and the L channel reverberation signal (L ′) and the
  • Rev and R channel reverberation signal R '
  • a configuration using monaural reverberation signal can be used.
  • FIG. 10 is a block diagram showing the main configuration of stereo speech decoding apparatus 300 according to the present embodiment.
  • the configuration and operation of separation section 201 and stereo speech decoding section 202 are the same as the configuration and operation of separation section 201 and stereo speech decoding section 202 of stereo speech decoding apparatus 200 shown in FIG. Therefore, the explanation is omitted o
  • the monaural signal generation unit 301 uses the L channel reconstructed signal (L ′) and the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202 to generate a monaural reconstructed signal (M ′). Is calculated and output.
  • the monaural reconstructed signal ( ⁇ ′) is calculated in the same manner as the monaural signal ( ⁇ ) in the monaural signal generation unit 102 in FIG.
  • the monaural signal all-pass filter 302 generates a monaural reverberation signal ( ⁇ ') using the all-pass filter parameter and the monaural reconstructed signal ( ⁇ ') input from the monaural signal generation unit 301, and outputs an L channel.
  • the all-pass filter parameters are the L-channel all-pass filter 203 and the R-channel all-pass filter shown in FIG. Similar to the data 204, it is represented by the transfer function shown in equation (6).
  • the L channel spatial information reproduction unit 303 receives the cross correlation comparison result a input from the separation unit 201, the L channel reconstructed signal (L ′) input from the stereo speech decoding unit 202, and the monaural signal all-pass filter 302. Using the monaural reverberation signal (M,)
  • the L channel decoded signal (L ′ ′) is calculated and output according to the following equation (14).
  • the R channel spatial information reproduction unit 304 receives the cross-correlation comparison result ⁇ input from the separation unit 201, the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202, and the monaural signal all-pass filter 302. Monaural reverberation signal ( ⁇ ,
  • the L channel decoded signal is obtained from the orthogonality between L 'and M' and the orthogonality between R 'and M'.
  • the L channel spatial information reproduction unit 303 and the R channel spatial information reproduction unit 304 calculate the decoded signal using the cross-correlation comparison result ⁇ according to the equations (14) and (15),
  • the cross-correlation value between two channels is the same as the original cross-correlation value 1
  • the spatial information contained in the signal can be reproduced, and the spatial image of the decoded stereo audio signal can be improved.
  • the force S described by taking as an example the case where the monaural reconstructed signal ( ⁇ ') is calculated by the monaural signal generation unit 301 is not limited to this, and stereo audio decoding is performed.
  • unit 202 has a monaural signal decoding unit that decodes a monaural signal
  • monaural reconstructed signal ( ⁇ ′) may be obtained directly by stereo audio decoding unit 202.
  • the left channel is the L channel and the right channel is the R channel. It goes without saying that the positional relationship between the left and right is not limited by this notation.
  • the stereo speech decoding apparatus in each of the above embodiments has been described as receiving and processing the bitstream transmitted by the stereo speech coding apparatus in each of the above embodiments, the present invention is not limited to this.
  • the bit stream received and processed by the stereo audio decoding device in each of the above embodiments is not limited to this, and may be any bit stream transmitted by an encoding device capable of generating a bit stream that can be processed by this decoding device. .
  • the stereo speech coding apparatus and stereo speech decoding apparatus according to the present invention can be mounted on a communication terminal apparatus in a mobile communication system, and thereby a communication terminal having the same effects as described above.
  • An apparatus can be provided.
  • the power described by taking the case where the present invention is configured by hardware as an example can be realized by software.
  • the stereo sound encoding method / decoding method algorithm according to the present invention is described in a programming language, and the program is stored in a memory and executed by an information processing means, whereby the stereo sound according to the present invention is recorded.
  • a function similar to that of the encoding device / decoding device can be realized.
  • each functional block used in the description of each of the above embodiments is typically realized as an LSI that is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include some or all of them.
  • the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible.
  • FPGA Field Programmable Gate Array
  • the stereo speech coding apparatus, stereo speech decoding apparatus, and these methods according to the present invention can be applied to uses such as stereo speech coding of mobile communication terminals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)

Abstract

Disclosed is a stereo audio encoding device capable of improving a spatial image of a decoded audio in stereo audio encoding. In this device, an original cross correlation calculation unit (101) calculates a mutual relationship coefficient (C1) between the original L channel signal and the original R channel signal. A stereo audio reconfiguration unit (104) subjects the inputted L channel signal and the R channel signal to encoding and decoding so as to generate an L channel reconfigured signal (L') and an R channel reconfigured signal (R'). A reconfiguration cross correlation calculation unit (105) calculates a cross correlationcoefficient (C2) between the L channel reconfigured signal (L') and the R channel reconfigured signal (R'). A cross correlation comparison unit (106) calculates and outputs a comparison result α between the cross correlation coefficient (C1) and the cross correlation coefficient (C2).

Description

明 細 書  Specification
ステレオ音声符号化装置、ステレオ音声復号装置、およびこれらの方法 技術分野  Technical field of stereo speech coding apparatus, stereo speech decoding apparatus, and methods thereof
[0001] 本発明は、移動体通信システムまたはインターネットプロトコル(IP : Internet Protoc ol)を用いたパケット通信システム等において、ステレオ音声信号の符号化/復号を 行う際に用いられるステレオ音声符号化装置、ステレオ音声復号装置、及びこれらの 方法に関する。  [0001] The present invention relates to a stereo speech coding apparatus used for encoding / decoding a stereo speech signal in a mobile communication system or a packet communication system using an Internet protocol (IP), The present invention relates to a stereo speech decoding apparatus and a method thereof.
背景技術  Background art
[0002] 移動体通信システムまたは IPを用いたパケット通信システム等において、 DSP (Dig ital Signal Processor)のディジタル信号処理速度の向上と帯域幅の拡大により高ビッ トレートの伝送が可能となってきている。伝送レートのさらなる高速化が進めば、複数 チャネルを伝送するだけの帯域 (広帯域)を確保できるようになるため、モノラル方式 が主流の音声通信においても、ステレオ方式による通信 (ステレオ通信)が普及する ことが期待される。広帯域のステレオ通信では、より自然なサウンド環境に関する情 報を符号化することができ、ヘッドフォンあるいはスピーカーで再生すると、聴取者が 知覚する空間イメージが生まれる。  [0002] In mobile communication systems or packet communication systems using IP, high bit rate transmission has become possible due to improved digital signal processing speed of DSP (Digital Signal Processor) and wider bandwidth. . If the transmission rate is further increased, it will be possible to secure a band that can transmit multiple channels (broadband), so stereo communication (stereo communication) will become widespread even in the case of monaural audio communication. It is expected. Broadband stereo communication can encode information about a more natural sound environment, and when played through headphones or speakers, creates a spatial image perceived by the listener.
[0003] ステレオオーディオ信号に含まれている空間情報を符号化する技術として、バイノ 一ラル'キュー符号化 (BCC : Binaural Cue Coding)が挙げられる。バイノーラル'キュ 一符号化にお!/、て、符号化側はステレオオーディオ信号を構成する複数チャネルの 信号を合成して生成されたモノラル信号を符号化し、チャネル信号間のキュー(チヤ ネル間キュー)を算出して符号化する。チャネル間キューとは、モノラル信号からチヤ ネル信号を予測するのに使用される副情報として、チャネル間レベル差 (ILD: Inter- channei Level Difference)、ナヤィ;ノレ間時間;^ (ITD : Inter— channel Time Difference; 、およびチャネル間相関関係 (ICC : Inter-Channel Correlation)などを含む。復号側 は、モノラル信号の符号化パラメータを復号してモノラル復号信号を得、モノラル復 号信号の残響信号を生成し、モノラル復号信号と、その残響信号と、チャネル間キュ 一とを用いてステレオオーディオ信号を再構築する。 [0004] このように、ステレオオーディオ信号に含まれている空間情報を符号化する技術の 開示例として、非特許文献 1および非特許文献 2が挙げられる。図 1は、非特許文献 1が開示するステレオオーディオ符号化装置 10の主要な構成を示すブロック図であ る。図 1において、モノラル信号生成部 11は、入力されるステレオオーディオ信号を 構成する Lチャネル信号と Rチャネル信号とを用いてモノラル信号 (M)を生成し、モノ ラル信号符号化部 12に出力する。モノラル信号符号化部 12は、モノラル信号生成 部 11で生成されたモノラル信号を符号化してモノラル信号符号化パラメータを生成 し、多重部 14に出力する。チャネル間キュー算出部 13は、入力される Lチャネル信 号と Rチャネル信号との ILD、 ITD、および ICCなどを含むチャネル間キューを算出 し、多重部 14に出力する。多重部 14は、モノラル信号符号化部 12から入力されるモ ノラル信号符号化パラメータと、チャネル間キュー算出部 13から入力されるチャネル 間キューとを多重し、得られるビットストリームをステレオオーディオ復号装置 20に送 信する。 [0003] As a technique for encoding spatial information included in a stereo audio signal, binaural cue coding (BCC) can be cited. For binaural cu coding, the encoding side encodes the monaural signal generated by combining the signals of the multiple channels that make up the stereo audio signal, and queues between the channel signals (inter-channel cues). ) Is calculated and encoded. Inter-channel cues are sub-information used to predict channel signals from monaural signals. Inter-channel level difference (ILD), Nya; inter-noisy time; ^ (ITD: Inter— channel time difference; and Inter-Channel Correlation (ICC) etc. The decoding side decodes the monaural signal encoding parameters to obtain the monaural decoded signal, and the monaural decoded signal reverberation signal. The stereo audio signal is reconstructed using the monaural decoded signal, its reverberation signal, and the inter-channel queue. As described above, Non-Patent Document 1 and Non-Patent Document 2 can be cited as disclosure examples of a technique for encoding spatial information included in a stereo audio signal. FIG. 1 is a block diagram showing the main configuration of stereo audio encoding apparatus 10 disclosed in Non-Patent Document 1. In FIG. 1, a monaural signal generation unit 11 generates a monaural signal (M) using an L channel signal and an R channel signal that constitute an input stereo audio signal, and outputs the monaural signal (M) to the monaural signal encoding unit 12. . The monaural signal encoding unit 12 encodes the monaural signal generated by the monaural signal generation unit 11 to generate a monaural signal encoding parameter, and outputs it to the multiplexing unit 14. The inter-channel queue calculation unit 13 calculates an inter-channel queue including ILD, ITD, ICC, and the like of the input L channel signal and R channel signal, and outputs them to the multiplexing unit 14. The multiplexing unit 14 multiplexes the monaural signal encoding parameter input from the monaural signal encoding unit 12 and the inter-channel queue input from the inter-channel queue calculation unit 13, and the obtained bit stream is a stereo audio decoding device. Send to 20.
[0005] 図 2は、非特許文献 1が開示するステレオオーディオ復号装置 20の主要な構成を 示すブロック図である。図 2において、分離部 21は、ステレオオーディオ符号化装置 10から送信されるビットストリームに対して分離処理を行い、得られるモノラル信号符 号化パラメータをモノラル信号復号部 22に出力し、得られるチャネル間キューを第 1 キュー合成部 24および第 2キュー合成部 25に出力する。モノラル信号復号部 22は、 分離部 21から入力されるモノラル信号符号化パラメータを用いて復号処理を行い、 得られるモノラル復号信号を、オールパスフィルタ 23、第 1キュー合成部 24、および 第 2キュー合成部 25に出力する。オールパスフィルタ 23は、モノラル信号復号部 22 力、ら入力されるモノラル復号信号を所定時間遅延させ、生成されたモノラル残響信号 (M ' )を第 1キュー合成部 24、および第 2キュー合成部 25に出力する。第 1キュー FIG. 2 is a block diagram showing the main configuration of stereo audio decoding apparatus 20 disclosed in Non-Patent Document 1. In FIG. 2, the separation unit 21 performs separation processing on the bitstream transmitted from the stereo audio encoding device 10, outputs the obtained monaural signal coding parameters to the monaural signal decoding unit 22, and obtains the obtained channel. The inter-queue is output to the first queue combining unit 24 and the second queue combining unit 25. The monaural signal decoding unit 22 performs a decoding process using the monaural signal encoding parameters input from the separation unit 21, and converts the obtained monaural decoded signal into an all-pass filter 23, a first queue synthesis unit 24, and a second queue synthesis. Output to part 25. The all-pass filter 23 delays the input monaural decoded signal for a predetermined time from the monaural signal decoding unit 22 and outputs the generated monaural reverberation signal (M ′) to the first cue synthesizing unit 24 and the second cue synthesizing unit 25. Output to. 1st queue
Rev Rev
合成部 24は、分離部 21から入力されるチャネル間キュー、モノラル信号復号部 22か ら入力されるモノラル復号信号、およびオールパスフィルタ 23から入力されるモノラ ノレ残響信号を用いて復号処理を行い、得られる Lチャネル復号信号 (L' )を出力する 。第 2キュー合成部 25は、分離部 21から入力されるチャネル間キュー、モノラル信号 復号部 22から入力されるモノラル復号信号、およびオールパスフィルタ 23から入力 されるモノラル残響信号を用いて復号処理を行い、得られる Rチャネル復号信号 (R' )を出力する。 The synthesizing unit 24 performs a decoding process using the inter-channel queue input from the demultiplexing unit 21, the monaural decoded signal input from the monaural signal decoding unit 22, and the monaural reverberation signal input from the all-pass filter 23. The obtained L channel decoded signal (L ') is output. The second cue synthesis unit 25 receives the inter-channel queue input from the separation unit 21, the monaural decoded signal input from the monaural signal decoding unit 22, and the all-pass filter 23. Decoding is performed using the monaural reverberation signal, and the resulting R channel decoded signal (R ′) is output.
[0006] ここで、従来の携帯電話は既に、ステレオ機能を有するマルチメディアプレイヤや F Mラジオの機能を搭載することができる。さらに、第 4世代の携帯電話及び IP電話等 ではステレオオーディオ信号だけでなぐステレオ音声信号の録音、再生等の機能 が追加されることが予想される。  [0006] Here, the conventional mobile phone can already be equipped with a multimedia player having a stereo function and an FM radio function. Furthermore, it is expected that functions such as recording and playback of stereo audio signals will be added to 4th generation mobile phones and IP phones.
非特許文献 l : ISO/IEC 14496-3:2005 Part3 Audio, 8.6.4 Parametric stereo  Non-patent literature l: ISO / IEC 14496-3: 2005 Part3 Audio, 8.6.4 Parametric stereo
非特許文献 2 : ISO/IEC 23003-1 :2006/FCD MPEG Surround (ISO/IEC 23003-1:20 Non-Patent Document 2: ISO / IEC 23003-1: 2006 / FCD MPEG Surround (ISO / IEC 23003-1: 20
07Partl MPEG Surround) 07Partl MPEG Surround)
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0007] しかしながら、ステレオオーディオ信号の符号化においては ILD、 ITD、および ICC という 3つのチャネル間キューを算出して符号化するのに対して、ステレオ音声の符 号化においては、 ILDおよび ITDという 2つのチャネル間キューのみを符号化する。 I CCは、ステレオ音声信号に含まれている重要な空間情報であるため、復号側にお V、て ICCを利用せず生成されたステレオ音声には空間イメージが欠如して!/、る。従つ て、ステレオ復号信号の空間イメージを向上するためには、ステレオ音声符号化に、 I LDおよび ITDに加え、さらに空間情報を符号化する構成を追加する必要がある。  [0007] However, in stereo audio signal encoding, ILD, ITD, and ICC are calculated and encoded as three inter-channel cues, whereas in stereo audio encoding, ILD and ITD are encoded. Encode only two inter-channel queues. Since I CC is important spatial information contained in the stereo audio signal, the stereo audio generated without using V and ICC on the decoding side lacks a spatial image! /. Therefore, in order to improve the spatial image of the stereo decoded signal, it is necessary to add a configuration for encoding spatial information in addition to ILD and ITD to stereo speech coding.
[0008] 本発明の目的は、ステレオ音声符号化において、復号音声の空間イメージを向上 することができるステレオ音声符号化装置、ステレオ音声復号装置、およびこれらの 方法を提供することである。  An object of the present invention is to provide a stereo speech coding apparatus, a stereo speech decoding apparatus, and a method thereof that can improve a spatial image of decoded speech in stereo speech coding.
課題を解決するための手段  Means for solving the problem
[0009] 本発明のステレオ音声符号化装置は、ステレオ音声を構成する第 1チャネル信号と 第 2チャネル信号との第 1相互相関係数を算出する第 1算出手段と、前記第 1チヤネ ル信号および前記第 2チャネル信号を用いて第 1チャネル再構築信号および第 2チ ャネル再構築信号を生成するステレオ音声再構築手段と、前記第 1チャネル再構築 信号と前記第 2チャネル再構築信号との第 2相互相関係数を算出する第 2算出手段 と、前記第 1相互相関係数と前記第 2相互相関係数とを比較することにより、前記ステ レオ音声の空間情報を含む相互相関比較結果を得る比較手段と、を具備する構成 を採る。 [0009] The stereo speech coding apparatus according to the present invention includes a first calculation means for calculating a first cross-correlation coefficient between a first channel signal and a second channel signal constituting stereo speech, and the first channel signal. Stereo audio reconstructing means for generating a first channel reconstructed signal and a second channel reconstructed signal using the second channel signal, and the first channel reconstructed signal and the second channel reconstructed signal By comparing the first cross-correlation coefficient with the second cross-correlation coefficient, second calculation means for calculating a second cross-correlation coefficient, the step is performed. Comparing means for obtaining a cross-correlation comparison result including spatial information of Leo speech is adopted.
[0010] また、本発明のステレオ音声復号装置は、受信したビットストリームから、符号化装 置において生成された、ステレオ音声を構成する第 1チャネル信号および第 2チヤネ ル信号それぞれに関する第 1パラメータおよび第 2パラメータと、前記第 1チャネル信 号と前記第 2チャネル信号との第 1相互相関と前記第 1チャネル信号および前記第 2 チャネル信号を用いて生成された第 1チャネル再構築信号と第 2チャネル再構築信 号との第 2相互相関とを比較して得られた、前記ステレオ音声に関する空間情報を含 む相互相関比較結果と、を得る分離手段と、前記第 1パラメータおよび前記第 2パラ メータを用いて第 1チャネル再構築復号信号および第 2チャネル再構築復号信号を 生成するステレオ音声復号手段と、前記第 1チャネル再構築復号信号を用いて第 1 チャネル残響信号を生成するとともに、前記第 2チャネル再構築復号信号を用いて 第 2チャネル残響信号を生成するステレオ残響信号生成手段と、前記第 1チャネル 再構築復号信号と、前記第 1チャネル残響信号と、前記相互相関比較結果とを用い て、第 1チャネル復号信号を生成する第 1空間情報再現手段と、前記第 2チャネル再 構築復号信号と、前記第 2チャネル残響信号と、前記相互相関比較結果とを用いて 、第 2チャネル復号信号を生成する第 2空間情報再現手段と、を具備する構成を採る 発明の効果  [0010] Further, the stereo speech decoding apparatus of the present invention includes a first parameter and a second channel signal, which are generated by the encoding device from the received bit stream and each of the first channel signal and the second channel signal constituting the stereo sound. Two parameters, a first cross-correlation between the first channel signal and the second channel signal, a first channel reconstructed signal and a second channel generated using the first channel signal and the second channel signal. Separation means for obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the second cross-correlation with the reconstructed signal, the first parameter, and the second parameter. Stereo audio decoding means for generating a first channel reconstructed decoded signal and a second channel reconstructed decoded signal using the first channel reconstructed decoded signal And a stereo reverberation signal generating means for generating a second channel reverberation signal using the second channel reconstructed decoded signal, the first channel reconstructed decoded signal, and Using the first channel reverberation signal and the cross-correlation comparison result, first spatial information reproduction means for generating a first channel decoded signal, the second channel reconstructed decoded signal, and the second channel reverberation signal The second spatial information reproducing means for generating a second channel decoded signal using the cross-correlation comparison result is employed.
[0011] 本発明によれば、ステレオ音声信号の符号化において、チャネル間相互相関(IC C)に関する空間情報として 2つの相互相関係数を比較し、比較結果をステレオ復号 側に送信することにより、復号されたステレオ音声信号の空間イメージを向上すること ができる。  [0011] According to the present invention, in stereo audio signal encoding, two cross-correlation coefficients are compared as spatial information related to inter-channel cross-correlation (ICC), and the comparison result is transmitted to the stereo decoding side. In addition, the spatial image of the decoded stereo audio signal can be improved.
図面の簡単な説明  Brief Description of Drawings
[0012] [図 1]従来技術に係るステレオオーディオ符号化装置の主要な構成を示すブロック図 [図 2]従来技術に係るステレオオーディオ復号装置の主要な構成を示すブロック図 [図 3]本発明の実施の形態 1に係るステレオ音声符号化装置の主要な構成を示すブ ロック図 [図 4]本発明の実施の形態 1に係るステレオ音声再構築部の内部の主要な構成を示 すブロック図 FIG. 1 is a block diagram showing the main configuration of a stereo audio encoding device according to the prior art. FIG. 2 is a block diagram showing the main configuration of a stereo audio decoding device according to the prior art. FIG. 3 is a block diagram showing the main configuration of the stereo speech coding apparatus according to Embodiment 1 of the present invention. FIG. 4 is a block diagram showing a main configuration inside a stereo speech reconstruction unit according to Embodiment 1 of the present invention.
[図 5]本発明の実施の形態 1に係る適応フィルタの構成および動作を示すための図 [図 6]本発明の実施の形態 1に係るステレオ音声符号化装置におけるステレオ音声 符号化処理の手順の一例を示すフロー図  FIG. 5 is a diagram for illustrating the configuration and operation of an adaptive filter according to Embodiment 1 of the present invention. FIG. 6 is a procedure of stereo speech coding processing in the stereo speech coding apparatus according to Embodiment 1 of the present invention. Flow diagram showing an example
[図 7]本発明の実施の形態 1に係るステレオ音声復号装置の主要な構成を示すプロ ック図  FIG. 7 is a block diagram showing the main configuration of the stereo speech decoding apparatus according to Embodiment 1 of the present invention.
[図 8]本発明の実施の形態 1に係るステレオ音声復号部の内部の主要な構成を示す ブロック図  FIG. 8 is a block diagram showing a main configuration inside a stereo speech decoding unit according to Embodiment 1 of the present invention.
[図 9]本発明の実施の形態 1に係るステレオ音声復号装置におけるステレオ音声復 号処理の手順の一例を示すフロー図  FIG. 9 is a flowchart showing an example of a procedure of stereo audio decoding processing in the stereo audio decoding device according to Embodiment 1 of the present invention.
[図 10]本発明の実施の形態 2に係るステレオ音声復号装置の主要な構成を示すプロ ック図  FIG. 10 is a block diagram showing the main configuration of a stereo speech decoding apparatus according to Embodiment 2 of the present invention.
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0013] 以下、本発明の各実施の形態について、添付図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
[0014] 各実施の形態においては、ステレオ音声信号は左(Uチャネルと右 (R)チャネルと からなる場合を例にとって説明する。各実施の形態に係るステレオ音声符号化装置 は、入力されるオリジナルの Lチャネル信号と Rチャネル信号との相互相関係数 Cを 算出する。また、各実施の形態に係るステレオ音声符号化装置はローカルなステレ ォ音声再構築部を備え、 Lチャネル信号および Rチャネル信号を再構築し、再構築さ れた Lチャネル信号と Rチャネル信号との相互相関係数 Cを算出する。各実施の形 In each embodiment, a case where a stereo audio signal is composed of a left (U channel and right (R) channel will be described as an example. The stereo audio encoding device according to each embodiment is input. The cross-correlation coefficient C between the original L-channel signal and the R-channel signal is calculated, and the stereo speech coding apparatus according to each embodiment includes a local stereo speech reconstructing unit. The channel signal is reconstructed, and the cross-correlation coefficient C between the reconstructed L channel signal and R channel signal is calculated.
2  2
態に係るステレオ音声符号化装置は、相互相関係数 Cと、相互相関係数 Cとを比  The stereo speech coding apparatus according to the state compares the cross-correlation coefficient C with the cross-correlation coefficient C.
1 2 較し、比較結果 αをステレオ音声信号に含まれている空間情報としてステレオ音声 復号装置に送信する。  1 2 Compare and send the comparison result α to the stereo audio decoding device as spatial information included in the stereo audio signal.
[0015] (実施の形態 1) [0015] (Embodiment 1)
図 3は、本発明の実施の形態 1に係るステレオ音声符号化装置 100の主要な構成 を示すブロック図である。ステレオ音声符号化装置 100は、入力されるステレオ信号 の Lチャネル信号と Rチャネル信号とを用いてステレオ音声符号化処理を行い、得ら れるビットストリームを後述するステレオ音声復号装置 200に送信する。なお、ステレ ォ音声符号化装置 100と対応するステレオ音声復号装置 200が、モノラル信号およ びステレオ信号のいずれの復号信号を出力することにより、モノラル/ステレオスケ ーラブル符号化が実現される。 FIG. 3 is a block diagram showing the main configuration of stereo speech coding apparatus 100 according to Embodiment 1 of the present invention. The stereo speech coding apparatus 100 performs stereo speech coding processing using the input L-channel signal and R-channel signal of the stereo signal. The transmitted bit stream is transmitted to a stereo audio decoding device 200 described later. Note that the stereo speech decoding apparatus 200 corresponding to the stereo speech coding apparatus 100 outputs a decoded signal of either a monaural signal or a stereo signal, thereby realizing monaural / stereo scalable coding.
オリジナル相互相関算出部 101は、ステレオ音声信号を構成するオリジナルの Lチ ャネル信号 (L)と Rチャネル信号 (R)との相互相関係数 Cを、下記の式(1)に従って 算出し、相互相関比較部 106に出力する。  The original cross-correlation calculation unit 101 calculates a cross-correlation coefficient C between the original L channel signal (L) and the R channel signal (R) constituting the stereo audio signal according to the following equation (1), and the cross correlation coefficient C is calculated. Output to correlation comparison section 106.
Figure imgf000008_0001
Figure imgf000008_0001
n : 時間軸上におけるサンプル番号  n: Sample number on the time axis
L ( n ) : Lチャネル信号  L (n): L channel signal
R ( n ) : Rチャネル信号  R (n): R channel signal
C! : Lチャネル信号と Rチャネル信号との相互相関係数 モノラル信号生成部 102は、例えば下記の式(2)に従って、 Lチャネル信号 (L)と R チャネル信号 (R)とを用いてモノラル信号 (M)を生成し、生成されたモノラル信号( M)をモノラル信号符号化部 103、およびステレオ音声再構築部 104に出力する。  C! : Cross-correlation coefficient between L channel signal and R channel signal The monaural signal generation unit 102 uses the L channel signal (L) and the R channel signal (R), for example, according to the following equation (2), M) is generated, and the generated monaural signal (M) is output to the monaural signal encoding unit 103 and stereo audio reconstruction unit 104.
[数 2] [Equation 2]
M(n) = ^ [L(n) + R(n)] ■■■ ( 2 ) ただし、 n : 時間軸上におけるサンプル番号 M (n) = ^ [L (n) + R (n)] ■■■ (2) where n is the sample number on the time axis
L ( n ) : Lチャネル信号  L (n): L channel signal
( n ) : Rチャネル信号  (n): R channel signal
M ( n ) : モノラル信号 モノラル信号符号化部 103は、モノラル信号生成部 102から入力されるモノラル信 号に対して、 AMR—WB(Adaptive MultiRate - WideBand)などの音声符号化処理を 行い、得られるモノラル信号符号化パラメータをステレオ音声再構築部 104、および 多重部 107に出力する。 [0019] ステレオ音声再構築部 104は、モノラル信号生成部 102から入力されるモノラル信 号 (M)を用いて Lチャネル信号 (L)および Rチャネル信号 (R)に対して符号化を行 い、得られる Lチャネル適応フィルタパラメータおよび Rチャネル適応フィルタパラメ一 タを多重部 107に出力する。また、ステレオ音声再構築部 104は、得られる Lチヤネ ル適応フィルタパラメータ、 Rチャネル適応フィルタパラメータ、およびモノラル信号符 号化部 103から入力されるモノラル信号符号化パラメータを用いて復号処理を行い、 得られる Lチャネル再構築信号 (L' )および Rチャネル再構築信号 (R' )を再構築相 互相関算出部 105に出力する。なお、ステレオ音声再構築部 104の詳細については 後述する。 M (n): Monaural signal The monaural signal encoding unit 103 performs an audio encoding process such as AMR—WB (Adaptive MultiRate-WideBand) on the monaural signal input from the monaural signal generation unit 102, and obtains it. The monaural signal encoding parameters to be output to the stereo speech reconstruction unit 104 and the multiplexing unit 107. Stereo audio reconstructing section 104 encodes L channel signal (L) and R channel signal (R) using monaural signal (M) input from monaural signal generating section 102. The obtained L channel adaptive filter parameters and R channel adaptive filter parameters are output to multiplexing section 107. Stereo audio reconstructing section 104 performs decoding processing using the obtained L channel adaptive filter parameters, R channel adaptive filter parameters, and monaural signal encoding parameters input from monaural signal encoding section 103, and The obtained L channel reconstructed signal (L ′) and R channel reconstructed signal (R ′) are output to the reconstructed cross-correlation calculating unit 105. Details of the stereo audio reconstruction unit 104 will be described later.
[0020] 再構築相互相関算出部 105は、ステレオ音声再構築部 104から入力される Lチヤ ネル再構築信号 (L' )と、 Rチャネル再構築信号 (R' )との相互相関係数 Cを、下記  [0020] The reconstructed cross-correlation calculating unit 105 performs a cross-correlation coefficient C between the L channel reconstructed signal (L ') input from the stereo speech reconstructing unit 104 and the R channel reconstructed signal (R'). Below
2 の式(3)に従って算出し、相互相関比較部 106に出力する。  Calculated according to Equation (3) of 2 and output to the cross-correlation comparator 106.
Figure imgf000009_0001
Figure imgf000009_0001
ただし、 n : 時間軸上におけるサンプル番号  Where n is the sample number on the time axis
L ' ( n ) : Lチャネル再構築信号  L '(n): L channel reconstruction signal
R ' ( n ) : Rチャネル再構築信号  R '(n): R channel reconstruction signal
C 2 : Lチャネル再構築信号と Rチャネル再構築信号との C 2: L-channel reconstructed signal and the R channel reconstructed signal
相互相関係数  Cross-correlation coefficient
[0021] 相互相関比較部 106は、オリジナル相互相関算出部 101から入力される相互相関 係数 Cと、再構築相互相関算出部 105から入力される相互相関係数 Cとを下記の[0021] The cross-correlation comparison unit 106 uses the cross-correlation coefficient C input from the original cross-correlation calculation unit 101 and the cross-correlation coefficient C input from the reconstructed cross-correlation calculation unit 105 as follows:
1 2 1 2
式 (4)従って比較し、相互相関比較結果 αを多重部 107に出力する。  Comparison is made according to equation (4), and the cross correlation comparison result α is output to the multiplexing unit 107.
[数 4]
Figure imgf000010_0001
[Equation 4]
Figure imgf000010_0001
ただし、 : Lチャネル信号と Rチャネル信号との相互相関係数  Where: Cross-correlation coefficient between L channel signal and R channel signal
C 2 : Lチャネル再構築信号と Rチャネル再構築信号との C 2: L-channel reconstructed signal and the R channel reconstructed signal
相互相関係数  Cross-correlation coefficient
a : 相互相関比較結果  a: Cross-correlation comparison result
[0022] 再構築されたステレオ信号間の相互相関値 Cは、通常、オリジナルステレオ信号 [0022] The cross-correlation value C between the reconstructed stereo signals is usually the original stereo signal.
2  2
間の相互相関値 Cより大きい。そのような場合は、 Cは Cより大きぐ  The cross-correlation value between is greater than C. In such cases, C is greater than C
1 2 1 I a I ≤1が 満たされるので、そのパラメータを量子化/伝送するのに適している。  Since 1 2 1 I a I ≤1 is satisfied, it is suitable for quantizing / transmitting the parameters.
[0023] 多重部 107は、モノラル信号符号化部 103から入力されるモノラル信号符号化パラ メータ、ステレオ音声再構築部 104から入力される Lチャネル適応フィルタパラメータ 、 Rチャネル適応フィルタパラメータ、および相互相関比較部 106から入力される相 互相関比較結果 αを多重し、得られるビットストリームをステレオ音声復号装置 200 に送信する。 Multiplexer 107 includes a monaural signal encoding parameter input from monaural signal encoding unit 103, an L channel adaptive filter parameter, an R channel adaptive filter parameter, and a cross-correlation input from stereo audio reconstruction unit 104. The cross correlation comparison result α input from the comparison unit 106 is multiplexed, and the obtained bit stream is transmitted to the stereo speech decoding apparatus 200.
[0024] 図 4は、ステレオ音声再構築部 104の内部の主要な構成を示すブロック図である。  FIG. 4 is a block diagram showing a main configuration inside stereo audio reconstructing section 104.
[0025] Lチャネル適応フィルタ 141は、適応フィルタからなり、 Lチャネル信号(U、および モノラル信号生成部 102から入力されるモノラル信号 (Μ)をそれぞれ基準信号、お よび入力信号として用いて、基準信号と入力信号との平均二乗誤差が最小となるよう な適応フィルタパラメータを求め、 Lチャネル合成フィルタ 144、および多重部 107に 出力する。以下、 Lチャネル適応フィルタ 141において求められる適応フィルタパラメ ータを Lチャネル適応フィルタパラメータと称す。  [0025] The L channel adaptive filter 141 includes an adaptive filter, and uses the L channel signal (U and the monaural signal (Μ) input from the monaural signal generation unit 102 as a reference signal and an input signal, respectively. An adaptive filter parameter that minimizes the mean square error between the signal and the input signal is obtained and output to the L channel synthesis filter 144 and the multiplexing unit 107. Hereinafter, the adaptive filter parameter obtained by the L channel adaptive filter 141 is obtained. Is called the L channel adaptive filter parameter.
[0026] Rチャネル適応フィルタ 142は、適応フィルタからなり、 Rチャネル信号(R)、および モノラル信号生成部 102から入力されるモノラル信号 (Μ)をそれぞれ基準信号、お よび入力信号として用いて、基準信号と入力信号との平均二乗誤差が最小となるよう な適応フィルタパラメータを求め、 Rチャネル合成フィルタ 145、および多重部 107に 出力する。以下、 Rチャネル適応フィルタ 142において求められる適応フィルタパラメ ータを Rチャネル適応フィルタパラメータと称す。  [0026] The R channel adaptive filter 142 includes an adaptive filter, and uses the R channel signal (R) and the monaural signal (Μ) input from the monaural signal generation unit 102 as a reference signal and an input signal, respectively. An adaptive filter parameter that minimizes the mean square error between the reference signal and the input signal is obtained and output to the R channel synthesis filter 145 and the multiplexing unit 107. Hereinafter, the adaptive filter parameters required in the R channel adaptive filter 142 are referred to as R channel adaptive filter parameters.
[0027] モノラル信号復号部 143は、モノラル信号符号化部 103から入力されるモノラル信 号符号化パラメータに対して AMR— WBなどの音声復号処理を行い、得られるモノ ラル復号信号(Μ' )を Lチャネル合成フィルタ 144、および Rチャネル合成フィルタ 14 5に出力する。 [0027] The monaural signal decoding unit 143 receives the monaural signal input from the monaural signal encoding unit 103. Speech decoding processing such as AMR—WB is performed on the signal coding parameters, and the resulting monaural decoded signal (Μ ′) is output to the L channel synthesis filter 144 and the R channel synthesis filter 145.
[0028] Lチャネル合成フィルタ 144は、モノラル信号復号部 143から入力されるモノラル復 号信号(Μ' )に対して、 Lチャネル適応フィルタ 141から入力される Lチャネル適応フ ィルタパラメータによりフィルタリングする復号処理を行い、得られる Lチャネル再構築 信号 (L' )を再構築相互相関算出部 105に出力する。  [0028] The L channel synthesis filter 144 performs decoding on the monaural decoded signal (') input from the monaural signal decoding unit 143 using the L channel adaptive filter parameter input from the L channel adaptive filter 141. Processing is performed, and the obtained L channel reconstructed signal (L ′) is output to the reconstructed cross correlation calculating unit 105.
[0029] Rチャネル合成フィルタ 145は、モノラル信号復号部 143から入力されるモノラル復 号信号(Μ' )に対して、 Rチャネル適応フィルタ 142から入力される Rチャネル適応フ ィルタパラメータによりフィルタリングする復号処理を行い、得られる Rチャネル再構築 信号 (R' )を再構築相互相関算出部 105に出力する。  [0029] The R channel synthesis filter 145 filters the monaural decoded signal (Μ ') input from the monaural signal decoding unit 143 using the R channel adaptive filter parameter input from the R channel adaptive filter 142. Processing is performed, and the obtained R channel reconstructed signal (R ′) is output to the reconstructed cross correlation calculating unit 105.
[0030] 図 5は、 Lチャネル適応フィルタ 141を構成する適応フィルタの構成および動作を説 明するための図である。この図において、 ηは時間軸上におけるサンプル番号を示す 。 Η (ζ)は、 H (z) =b +b (z 1) +b (z— 2) + - - +b (z k)であり、適応フィルタ、例 FIG. 5 is a diagram for explaining the configuration and operation of the adaptive filter that constitutes the L-channel adaptive filter 141. In this figure, η indicates a sample number on the time axis. Η (ζ) is H (z) = b + b (z 1 ) + b (z— 2 ) +--+ b (z k ), an adaptive filter, for example
0 1 2 k  0 1 2 k
えば FIR(Finite Impulse Response)フィルタのモデル(伝達関数)を示す。ここで、 kは 適応フィルタパラメータの次数を示し、 b= [b , b , · · · , b ]は適応フィルタパラメータ  For example, FIR (Finite Impulse Response) filter model (transfer function) is shown. Where k indicates the order of the adaptive filter parameter and b = [b 1, b 2,.
0 1 k  0 1 k
を示す。また、 x(n)は適応フィルタの入力信号を示し、 Lチャネル適応フィルタ 141の 場合、モノラル信号生成部 102から入力されるモノラル信号 (M)を用いる。また、 y(n) は適応フィルタの基準信号を示し、 Lチャネル適応フィルタ 141の場合、 Lチャネル信 号 (L)を用いる。  Indicates. X (n) represents an input signal of the adaptive filter. In the case of the L channel adaptive filter 141, the monaural signal (M) input from the monaural signal generation unit 102 is used. Y (n) represents the reference signal of the adaptive filter. In the case of the L channel adaptive filter 141, the L channel signal (L) is used.
[0031] 適応フィルタは、下記の式(5)に従って、基準信号と入力信号との平均二乗誤差が 最小となるような、適応フィルタパラメータ b= [b , b , · · · , b ]を求めて出力する。  [0031] The adaptive filter obtains an adaptive filter parameter b = [b 1, b 2,..., B] such that the mean square error between the reference signal and the input signal is minimized according to the following equation (5). Output.
0 1 k  0 1 k
[数 5]  [Equation 5]
MSE(b) = £}e(«)]2
Figure imgf000011_0001
― y'(n)f }= E. yt ri)一 btx{n一; ι ( 5
MSE (b) = £} e («)] 2
Figure imgf000011_0001
― Y '(n) f} = E. yt ri) one b t x {n one; ι (5
[0032] この式にお!/、て、 Eは統計的期待演算子 (statistical expectation operator)を表し、 e[0032] In this expression,! /, E represents a statistical expectation operator, and e
(n)は予測誤差を表し、 kはフィルタ次数を表す。 (n) represents the prediction error, and k represents the filter order.
[0033] Rチャネル適応フィルタ 142を構成する適応フィルタは、 Lチャネル適応フィルタ 14 1を構成する適応フィルタと同様な構成および動作を有し、基準信号 y(n)として、 R チャネル信号(R)が入力される点において Lチャネル適応フィルタ 141を構成するフ ィルタと相違する。 [0033] The adaptive filter constituting the R channel adaptive filter 142 is an L channel adaptive filter 14. 1 is different from the filter constituting the L channel adaptive filter 141 in that the R channel signal (R) is input as the reference signal y (n).
[0034] 図 6は、ステレオ音声符号化装置 100におけるステレオ音声符号化処理の手順の 一例を示すフロー図である。  FIG. 6 is a flowchart showing an example of the procedure of stereo speech coding processing in stereo speech coding apparatus 100.
[0035] まず、ステップ (以下、「ST」と省略する) 151において、オリジナル相互相関算出部First, in step (hereinafter abbreviated as “ST”) 151, the original cross-correlation calculation unit
101は、オリジナルの Lチャネル信号 (Uと Rチャネル信号 (R)との相互相関係数 C を算出する。 101 calculates the cross-correlation coefficient C between the original L channel signal (U and R channel signal (R)).
[0036] 次いで、 ST152において、モノラル信号生成部 102は、 Lチャネル信号と Rチヤネ ル信号とを用いて、モノラル信号を生成する。  [0036] Next, in ST152, monaural signal generation section 102 generates a monaural signal using the L channel signal and the R channel signal.
[0037] 次いで、 ST153において、モノラル信号符号化部 103は、モノラル信号を符号化し て、モノラル信号符号化パラメータを生成する。 Next, in ST153, monaural signal encoding section 103 encodes the monaural signal to generate a monaural signal encoding parameter.
[0038] 次いで、 ST154において、 Lチャネル適応フィルタ 141は、 Lチャネル信号とモノラ ル信号との平均二乗誤差が最小となるような Lチャネル適応フィルタパラメータを求め [0038] Next, in ST154, L channel adaptive filter 141 obtains an L channel adaptive filter parameter that minimizes the mean square error between the L channel signal and the monaural signal.
[0039] 次いで、 ST155において、 Rチャネル適応フィルタ 142は、 Rチャネル信号とモノラ ル信号との平均二乗誤差が最小となるような Rチャネル適応フィルタパラメータを求め [0039] Next, in ST155, the R channel adaptive filter 142 obtains an R channel adaptive filter parameter that minimizes the mean square error between the R channel signal and the monaural signal.
[0040] 次いで、 ST156において、モノラル信号復号部 143は、モノラル信号符号化パラメ ータを用いて復号処理を行い、モノラル復号信号 (Μ' )を生成する。 [0040] Next, in ST156, monaural signal decoding section 143 performs decoding processing using the monaural signal encoding parameter, and generates a monaural decoded signal (Μ ').
[0041] 次いで、 ST157において、 Lチャネル合成フィルタ 144は、モノラル復号信号(Μ,) と、 Lチャネル適応フィルタパラメータとを用いて Lチャネル信号を再構築し、 Lチヤネ ル再構築信号 (L' )を生成する。 [0041] Next, in ST157, the L channel synthesis filter 144 reconstructs the L channel signal using the monaural decoded signal (と,) and the L channel adaptive filter parameter, and the L channel reconstructed signal (L ' ) Is generated.
[0042] 次いで、 ST158において、 Rチャネル合成フィルタ 145は、モノラル復号信号(Μ,Next, in ST158, R channel synthesis filter 145 performs monaural decoded signal (Μ,
)と、 Rチャネル適応フィルタパラメータとを用いて Rチャネル信号を再構築し、 Rチヤ ネル再構築信号 (R' )を生成する。 ) And the R channel adaptive filter parameters, the R channel signal is reconstructed to generate the R channel reconstructed signal (R ′).
[0043] 次いで、 ST159において、再構築相互相関算出部 105は、 Lチャネル再構築信号 [0043] Next, in ST159, reconstructed cross-correlation calculating section 105 performs L channel reconstructed signal
(L' )と Rチャネル再構築信号 (R' )との相互相関係数 Cを算出する。 [0044] 次いで、 ST160において、相互相関比較部 106は、相互相関係数 Cと相互相関 係数 Cとを比較し、相互相関比較結果 αを求める。 The cross-correlation coefficient C between (L ') and the R channel reconstructed signal (R') is calculated. Next, in ST160, cross-correlation comparison section 106 compares cross-correlation coefficient C with cross-correlation coefficient C, and obtains cross-correlation comparison result α.
2  2
[0045] 次いで、 ST161において、多重部 107は、モノラル信号符号化パラメータ、 Lチヤ ネル適応フィルタパラメータ、 Rチャネル適応フィルタパラメータ、および相互相関比 較結果 αを多重して送信する。  [0045] Next, in ST161, multiplexing section 107 multiplexes and transmits the monaural signal encoding parameter, L channel adaptive filter parameter, R channel adaptive filter parameter, and cross-correlation comparison result α.
[0046] 上記のように、ステレオ音声符号化装置 100は Lチャネル適応フィルタ 141および R チャネル適応フィルタ 142において求められる適応フィルタパラメータを、チャネル間 レベル差 (ILD)およびチャネル間時間差 (ITD)に関する空間情報パラメータとして ステレオ音声復号装置 200に送信する。また、ステレオ音声符号化装置 100は相互 相関比較部 106において求められる相互相関比較結果 αを、 Lチャネル信号と Rチ ャネル信号とのチャネル間相互相関(ICC)に関する空間情報パラメータとしてステレ ォ音声復号装置 200に送信する。  [0046] As described above, stereo speech coding apparatus 100 converts the adaptive filter parameters obtained in L-channel adaptive filter 141 and R-channel adaptive filter 142 into the space related to the inter-channel level difference (ILD) and the inter-channel time difference (ITD). The information parameter is transmitted to the stereo speech decoding apparatus 200. Stereo speech coding apparatus 100 also performs stereo speech decoding using cross-correlation comparison result α obtained in cross-correlation comparing section 106 as a spatial information parameter regarding inter-channel cross-correlation (ICC) between the L channel signal and the R channel signal. Sent to device 200.
[0047] なお、本実施の形態では、ステレオ音声符号化装置 100が、相互相関比較結果 α の代わりに、オリジナルの Lチャネル信号 (L)と Rチャネル信号 (R)との相互相関係 数 Cを送信するようにしても良い。この場合でも、復号器側では、 Lチャネル再構築 信号 (L ' )と Rチャネル再構築信号 (R' )との相互相関係数 Cを得ることができるため  [0047] In this embodiment, stereo speech coding apparatus 100 uses correlation coefficient C between the original L channel signal (L) and R channel signal (R) instead of cross correlation comparison result α. May be transmitted. Even in this case, the decoder can obtain the cross-correlation coefficient C between the L-channel reconstructed signal (L ') and the R-channel reconstructed signal (R').
2  2
、 aは復号器側で計算することによって得られる。これにより、ステレオ音声符号化装 置 100では、 Lチャネルおよび Rチャネルの再構築信号を生成する必要がなくなるた め、演算量を削減することができる。  , A is obtained by calculating at the decoder side. As a result, the stereo speech coding apparatus 100 does not need to generate L channel and R channel reconstructed signals, thereby reducing the amount of computation.
[0048] 図 7は、ステレオ音声復号装置 200の主要な構成を示すブロック図である。 FIG. 7 is a block diagram showing the main configuration of stereo speech decoding apparatus 200.
[0049] 分離部 201は、ステレオ音声符号化装置 100から送信されるビットストリームに対し て分離処理を行い、得られるモノラル信号符号化パラメータ、 Lチャネル適応フィルタ ノ ラメータ、および Rチャネル適応フィルタパラメータをステレオ音声復号部 202に出 力し、相互相関比較結果 αを Lチャネル空間情報再現部 205、および Rチャネル空 間情報再現部 206に出力する。 [0049] Separating section 201 performs separation processing on the bit stream transmitted from stereo speech coding apparatus 100, and obtains the obtained monaural signal coding parameter, L channel adaptive filter parameter, and R channel adaptive filter parameter. The result is output to stereo speech decoding section 202, and cross-correlation comparison result α is output to L channel spatial information reproduction section 205 and R channel spatial information reproduction section 206.
[0050] ステレオ音声復号部 202は、分離部 201から入力されるモノラル信号符号化パラメ ータ、 Lチャネル適応フィルタパラメータ、および Rチャネル適応フィルタパラメータを 用いて、 Lチャネル信号および Rチャネル信号を復号し、得られる Lチャネル再構築 信号(L ' )を Lチャネルオールパスフィルタ 203、および Lチャネル空間情報再現部 2 05に出力する。また、ステレオ音声復号部 202は、復号により得た Rチャネル再構築 信号 (R' )を Rチャネルオールパスフィルタ 204、および Rチャネル空間情報再現部 2 06に出力する。なお、ステレオ音声復号部 202の詳細については後述する。 [0050] Stereo speech decoding section 202 decodes the L channel signal and the R channel signal using the monaural signal encoding parameter, the L channel adaptive filter parameter, and the R channel adaptive filter parameter input from demultiplexing section 201. L channel reconstruction obtained The signal (L ′) is output to the L-channel all-pass filter 203 and the L-channel spatial information reproduction unit 205. Stereo audio decoding section 202 outputs the R channel reconstructed signal (R ′) obtained by decoding to R channel all-pass filter 204 and R channel spatial information reproduction section 206. Details of the stereo audio decoding unit 202 will be described later.
[0051] Lチャネルオールパスフィルタ 203は、下記の式(6)に示す伝達関数を表すオール パスフィルタパラメータと、ステレオ音声復号部 202から入力される Lチャネル再構築 信号 (L ' )とを用いて Lチャネル残響信号 (L ' )を生成し、 Lチャネル空間情報再現  [0051] The L-channel all-pass filter 203 uses the all-pass filter parameter representing the transfer function shown in the following equation (6) and the L-channel reconstructed signal (L ') input from the stereo speech decoding unit 202. Generates L channel reverberation signal (L ') and reproduces L channel spatial information
Rev  Rev
部 205に出力する。  Output to part 205.
[数 6] … , . [Equation 6]…,.
Figure imgf000014_0001
Figure imgf000014_0001
[0052] この式において、 Η は、オールパスフィルタの伝達関数を示し、 a = [a , a ,… [0052] In this equation, 示 し represents the transfer function of the all-pass filter, and a = [a, a, ...
allpass 1 2 allpass 1 2
, a ]はオールパスフィルタパラメータを示し Nはオールパスフィルタパラメータの次, A] indicate all-pass filter parameters, N indicates all-pass filter parameters
N N
数を示す。なお、 Lチャネルオールパスフィルタ 203の入力信号 L 'と出力信号 L '  Indicates a number. Note that the input signal L ′ and output signal L ′ of the L-channel all-pass filter 203
Rev とは直交するため、それらの相互相関値 Correlation [L ' (n), L ' (n)] = 0である。ま  Since they are orthogonal to Rev, their cross-correlation values Correlation [L ′ (n), L ′ (n)] = 0. Ma
Rev  Rev
た、 L,のエネルギと L, のエネルギとは同様であるため、 I L,(n) I 2= I L ' (n) Since the energy of L, and the energy of L, are the same, IL, (n) I 2 = IL '(n)
Rev Rev Rev Rev
I でめる。 I'll do it.
[0053] Rチャネルオールパスフィルタ 204は、上記の式(6)に示す伝達関数を表すオール パスフィルタパラメータと、ステレオ音声復号部 202から入力される Rチャネル再構築 信号 (R' )とを用いて Rチャネル残響信号 (R' )を生成し、 Rチャネル空間情報再  The R channel all-pass filter 204 uses the all-pass filter parameter representing the transfer function shown in the above equation (6) and the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202. R channel reverberation signal (R ') is generated and R channel spatial information is regenerated.
Rev  Rev
現部 206に出力する。  Output to current part 206.
[0054] Lチャネル空間情報再現部 205は、分離部 201から入力される相互相関比較結果  [0054] The L channel spatial information reproduction unit 205 receives the cross-correlation comparison result input from the separation unit 201.
a、ステレオ音声復号部 202から入力される Lチャネル再構築信号 (L ' )、および Lチ ャネルオールパスフィルタ 203から入力される Lチャネル残響信号(L ' )を用いて、  a, using the L channel reconstructed signal (L ′) input from the stereo speech decoding unit 202 and the L channel reverberation signal (L ′) input from the L channel all-pass filter 203,
Rev  Rev
下記の式(7)に従って Lチャネル復号信号 (L' ' )を算出し、出力する。  Calculate and output the L channel decoded signal (L '') according to the following equation (7).
[0055] [数 7] r =«z,' +Vfi-«2 Rチャネル空間情報再現部 206は、分離部 201から入力される相互相関比較結果 a、ステレオ音声復号部 202から入力される Rチャネル再構築信号 (R')、および R チャネルオールパスフィルタ 204から入力される Rチャネル残響信号(R' )を用い [0055] [Equation 7] r = «z, '+ Vfi-« 2 The R channel spatial information reproduction unit 206 is input from the cross correlation comparison result a input from the separation unit 201, the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202, and the R channel all-pass filter 204. R channel reverberation signal (R ') is used.
Rev  Rev
て、下記の式(8)に従って Rチャネル復号信号 (R'')を算出し、出力する。 The R channel decoded signal (R ″) is calculated and output according to the following equation (8).
[数 8] ' =cR' +^-a2)RR (8 ) 前述したように、 L'とし' とは直交し、エネルギが同様であるため、 [Equation 8] '= cR' + ^-a 2 ) R R (8) As mentioned above, L 'and' are orthogonal and the energy is the same,
Rev  Rev
信号(L'')のエネルギは、下記の式(9)で与えられる。同様に、 復号信号 (R' ')のエネルギは、下記の式(10)で与えられる。 The energy of the signal (L ″) is given by the following equation (9). Similarly, the energy of the decoded signal (R ′ ′) is given by the following equation (10).
[数 9] r ,aL I 一 c LD + 2«V1- ( 9 ) [Equation 9] r, aL I one c L D + 2 «V1- (9)
R R R, Rev d o) また、 Lチャネル復号信号 (L', )と Rチャネル復号信号 (R', )との相互相関値 Cの R R R, Rev do) The cross-correlation value C between the L channel decoded signal (L ',) and the R channel decoded signal (R',)
3 分子項は下記の式(11)で与えられる。ここで、 Lチャネルオールパスフィルタ 203と R チャネルオールパスフィルタ 204とで異なるフィルタが用いられれば、式(11)の右辺 の第 2〜第 4項の相関算出のための各信号間はほぼ直交するため、第 2〜第 4項は 第 1項と比較して非常に小さくほぼゼロとみなせる。従って、 Lチャネル復号信号 (L' ')と Rチャネル復号信号 (R")との相互相関値 Cは式 (4)、 (9)、 (10)、 (11)から、  3 The molecular term is given by equation (11) below. Here, if different filters are used for the L-channel all-pass filter 203 and the R-channel all-pass filter 204, the signals for the correlation calculation of the second to fourth terms on the right side of Equation (11) are almost orthogonal. The second to fourth terms are much smaller than the first term and can be regarded as almost zero. Therefore, the cross-correlation value C between the L channel decoded signal (L '') and the R channel decoded signal (R ") is obtained from the equations (4), (9), (10),
3  Three
下記の式(12)に示すとおり、オリジナルの Lチャネル信号 (L)と Rチャネル信号 (R) との相互相関係数 Cに等しくなる。以上から、 Lチャネル空間情報再現部 205および Rチャネル空間情報再現部 206は、式(7)および式(8)に従って相互相関比較結果 aを利用して復号信号を算出することで、 2チャネル間の相互相関値 As shown in the following equation (12), it is equal to the cross-correlation coefficient C between the original L channel signal (L) and the R channel signal (R). From the above, the L channel spatial information reproduction unit 205 and the R channel spatial information reproduction unit 206 calculate the decoded signal using the cross-correlation comparison result a according to Equation (7) and Equation (8), thereby Cross-correlation value
相互相関値に等しくなるような 2チャネルの復号信号を得ることができる。 A two-channel decoded signal that is equal to the cross-correlation value can be obtained.
[数 11] ' · R" = a L · R ) + a (l - 2)(L · 7?Rev) + a 一 α2)( · + (1 - a2)(4ev ' RRev) … ( 1 1 ) [Equation 11] '· R "= a L · R) + a (l -? 2) (L · 7 Rev) + a one α 2) (· + (1 - a 2) (4 ev' R Rev) ... (1 1 )
¾i («)22^ (») ¾i ( «) 2 2 ^ ( »)
[0059] 図 8は、ステレオ音声復号部 202の内部の主要な構成を示すブロック図である。 FIG. 8 is a block diagram showing the main configuration inside stereo audio decoding section 202.
[0060] モノラル信号復号部 221は、分離部 201から入力されるモノラル信号符号化パラメ ータを用いて復号処理を行い、得られるモノラル復号信号 (Μ' )を Lチャネル合成フ ィルタ 222および Rチャネル合成フィルタ 223に出力する。  The monaural signal decoding unit 221 performs decoding processing using the monaural signal encoding parameter input from the separation unit 201, and converts the obtained monaural decoded signal (Μ ′) into the L channel synthesis filter 222 and R Output to channel synthesis filter 223.
[0061] Lチャネル合成フィルタ 222は、モノラル信号復号部 221から入力されるモノラル復 号信号 (Μ' )に対して、分離部 201から入力される Lチャネル適応フィルタパラメータ によりフィルタリングする復号処理を行い、得られる Lチャネル再構築信号 (L' )を Lチ ャネルオールパスフィルタ 203および Lチャネル空間情報再現部 205に出力する。  [0061] The L channel synthesis filter 222 performs a decoding process for filtering the monaural decoded signal (Μ ') input from the monaural signal decoding unit 221 with the L channel adaptive filter parameter input from the separation unit 201. The obtained L channel reconstructed signal (L ′) is output to the L channel all-pass filter 203 and the L channel spatial information reproduction unit 205.
[0062] Rチャネル合成フィルタ 223は、モノラル信号復号部 221から入力されるモノラル復 号信号 (Μ' )に対して、分離部 201から入力される Rチャネル適応フィルタパラメータ によりフィルタリングする復号処理を行い、得られる Rチャネル再構築信号 (R' ) Rチ ャネルオールパスフィルタ 204および Rチャネル空間情報再現部 206に出力する。  The R channel synthesis filter 223 performs a decoding process for filtering the monaural decoded signal (復 ′) input from the monaural signal decoding unit 221 with the R channel adaptive filter parameter input from the separation unit 201. The obtained R channel reconstructed signal (R ′) is output to the R channel all-pass filter 204 and the R channel spatial information reproduction unit 206.
[0063] 図 9は、ステレオ音声復号装置 200におけるステレオ音声復号処理の手順の一例 を示すフロー図である。  FIG. 9 is a flowchart showing an example of a procedure of stereo speech decoding processing in stereo speech decoding apparatus 200.
[0064] まず、 ST251において、分離部 201は、ステレオ音声符号化装置 100から送信さ れるビットストリームを用いて分離処理を行い、モノラル信号符号化パラメータ、 Lチヤ ネル適応フィルタパラメータ、 Rチャネル適応フィルタパラメータ、および相互相関比 較結果 aを生成する。  [0064] First, in ST251, separation section 201 performs separation processing using the bitstream transmitted from stereo speech coding apparatus 100, and performs monaural signal coding parameters, L channel adaptive filter parameters, R channel adaptive filters. Parameters and cross-correlation comparison result a are generated.
[0065] 次いで、 ST252において、モノラル信号復号部 221は、モノラル信号符号化パラメ ータを用いてモノラル信号を復号し、モノラル復号信号 (Μ' )を生成する。  Next, in ST252, monaural signal decoding section 221 decodes the monaural signal using the monaural signal encoding parameter to generate a monaural decoded signal (Μ ′).
[0066] 次いで、 ST253において、 Lチャネル合成フィルタ 222は、モノラル復号信号(Μ,) に対して、 Lチャネル適応フィルタパラメータによりフィルタリングする復号処理を行い[0066] Next, in ST253, L channel synthesis filter 222 performs monaural decoded signal (Μ,) For the L channel adaptive filter parameters
、 Lチャネル再構築信号 (L ' )を生成する。 , L channel reconstructed signal (L ') is generated.
[0067] 次いで、 ST254において、 Rチャネル合成フィルタ 223は、モノラル復号信号(M,Next, in ST254, R channel synthesis filter 223 performs monaural decoded signal (M,
)に対して、 Rチャネル適応フィルタパラメータによりフィルタリングする復号処理を行 い、 Rチャネル再構築信号 (R' )を生成する。 ) Is subjected to a decoding process for filtering with the R channel adaptive filter parameter to generate an R channel reconstructed signal (R ′).
[0068] 次いで、 ST255において、 Lチャネルオールパルフィルタ 203は、 Lチャネル再構 築信号 (L ' )を用いて Lチャネル残響信号 (L ' )を生成する。 [0068] Next, in ST255, the L-channel all-pal filter 203 generates an L-channel reverberation signal (L ') using the L-channel reconstructed signal (L').
Rev  Rev
[0069] 次いで、 ST256において、 Rチャネルオールパルフィルタ 204は、 Rチャネル再構 築信号 (R' )を用いて Rチャネル残響信号 (R' )を生成する。  [0069] Next, in ST256, the R channel all-pal filter 204 generates an R channel reverberation signal (R ') using the R channel reconstructed signal (R').
Rev  Rev
[0070] 次いで、 ST257において、 Lチャネル空間情報再現部 205は、 Lチャネル再構築 信号 (L ' )と、 Lチャネル残響信号 (L ' )と、相互相関比較結果 αとを用いて Lチヤ  Next, in ST257, L channel spatial information reproduction section 205 uses L channel reconstruction signal (L ′), L channel reverberation signal (L ′), and cross correlation comparison result α to
Rev  Rev
ネル復号信号 (L ' ' )を生成する。  A channel decoded signal (L '') is generated.
[0071] 次いで、 ST258において、 Rチャネル空間情報再現部 206は、 Rチャネル再構築 信号 (R' )と、 Rチャネル残響信号 (R' )と、相互相関比較結果 αとを用いて Rチヤ [0071] Next, in ST258, R channel spatial information reproduction section 206 uses R channel reconstruction signal (R '), R channel reverberation signal (R'), and cross correlation comparison result α to
Rev  Rev
ネル復号信号 (R' ' )を生成する。  A channel decoded signal (R '') is generated.
[0072] このように、本実施の形態によれば、ステレオ音声符号化装置 100において、チヤ ネル間レベル差 (ILD)およびチャネル間時間差 (ITD)に関する空間情報パラメータ である Lチャネル適応フィルタパラメータおよび Rチャネル適応フィルタパラメータに 加え、さらにチャネル間相互相関(ICC)に関する空間情報である相互相関比較結果 aをステレオ音声復号装置 200に送信する。そしてステレオ音声復号装置におレ、て はこれらの情報を用いてステレオ音声復号を行うため、復号音声の空間イメージを向 上すること力 Sでさる。 Thus, according to the present embodiment, in stereo speech coding apparatus 100, an L channel adaptive filter parameter, which is a spatial information parameter regarding inter-channel level difference (ILD) and inter-channel time difference (ITD), and In addition to the R channel adaptive filter parameters, a cross correlation comparison result a, which is spatial information related to inter-channel cross correlation (ICC), is transmitted to stereo speech decoding apparatus 200. Since the stereo speech decoding apparatus performs stereo speech decoding using these pieces of information, the power S can be improved by improving the spatial image of the decoded speech.
[0073] なお、本実施の形態では、チャネル間レベル差 (ILD)およびチャネル間時間差 (I TD)に関する空間情報パラメータとして、 Lチャネル適応フィルタパラメータと Lチヤネ ル適応フィルタパラメータとを求めて送信する場合を例にとって説明した力 本発明 はこれに限定されず、 Lチャネル適応フィルタパラメータおよび Rチャネル適応フィノレ タパラメータ以外のその他の、チャネル間差分情報を示す空間情報パラメータを求め て送信しても良い。 [0074] また、本実施の形態では、相互相関比較部 106において上記の式 (4)に従って相 互相関比較結果を求める場合を例にとって説明したが、本発明はこれに限定されず 、相互相関係数 Cと相互相関関係 Cとの差異を一意的に表す他の比較結果を求め In the present embodiment, the L channel adaptive filter parameter and the L channel adaptive filter parameter are obtained and transmitted as spatial information parameters regarding the inter-channel level difference (ILD) and the inter-channel time difference (ITD). The power described by taking the case as an example The present invention is not limited to this, and a spatial information parameter indicating inter-channel difference information other than the L channel adaptive filter parameter and the R channel adaptive filter parameter may be obtained and transmitted. . Further, in the present embodiment, the case where the cross-correlation comparison unit 106 obtains the cross-correlation comparison result according to the above equation (4) has been described as an example, but the present invention is not limited to this, Find other comparison results that uniquely represent the difference between the relationship number C and the cross-correlation C
1 2  1 2
ても良い。  May be.
[0075] また、本実施の形態では、 Lチャネルオールパスフィルタ 203および Rチャネルォー ノレパスフィルタ 204にお!/、て固定のオールパスフィルタパラメータを用いて Lチャネル 残響信号 (L' )および Rチャネル残響信号 (R' )を生成する場合を例にとって説  [0075] In this embodiment, the L channel reverberation signal (L ') and the R channel reverberation signal are used in the L channel allpass filter 203 and the R channel onrepath filter 204 using a fixed allpass filter parameter. An example of generating (R ')
Rev Rev  Rev Rev
明したが、ステレオ音声符号化装置 100から送信されるオールパスフィルタパラメ一 タを用いても良い。  As described above, all-pass filter parameters transmitted from stereo speech coding apparatus 100 may be used.
[0076] また、本実施の形態では、図 6及び図 9において、手順の一例としてシリアル的に各 ステップの処理を行う例を示したが、順序の入れ替えや並列化が可能なステップもあ る。例えば、 ST154において Lチャネル適応フィルタパラメータを算出し、 ST155に おいて Rチャネル適応フィルタパラメータを算出する場合を例にとって説明した力 こ の 2つのステップの順序を替えて、 ST154において Rチャネル適応フィルタパラメ一 タを算出し、 ST155において Lチャネル適応フィルタパラメータを算出しても良ぐま たは ST154および ST155における処理を並列処理にしても良い。また、 ST156で 行われるモノラル信号の復号は ST154の前でも ST155の前でもよく、 ST154や ST 155と並歹 IJに処理しても良い。同様に、 ST157と ST158との J噴序、 ST253と ST25 4との 1噴序、 S丁 255と S丁 256との 1噴序、 S丁 257と S丁 258との 1噴序を替免ても良く、 並列処理にしても良い。また、 ST151は、スタートから ST159までの間であれば、ど のようなタイミングで fiつても良い。  [0076] Further, in the present embodiment, in FIG. 6 and FIG. 9, an example is shown in which processing of each step is performed serially as an example of a procedure. However, there are steps that can be reordered or parallelized. . For example, the L channel adaptive filter parameter is calculated in ST154 and the R channel adaptive filter parameter is calculated in ST155 as an example. The order of these two steps is changed, and the R channel adaptive filter parameter is changed in ST154. The L channel adaptive filter parameters may be calculated in ST155, or the processing in ST154 and ST155 may be performed in parallel. Further, the decoding of the monaural signal performed in ST156 may be performed before ST154 or before ST155, and may be processed in parallel with ST154 or ST155. In the same way, J jets of ST157 and ST158, one jet of ST253 and ST25 4, one jet of S Ding 255 and S Ding 256, and one jet of S Ding 257 and S Ding 258 were replaced. Or parallel processing. Further, ST151 may be fi at any timing from the start to ST159.
[0077] また、本実施の形態では、図 7及び図 8においては、モノラル信号復号部 221で生 成されたモノラル復号信号 (Μ' )はステレオ音声復号装置 200の外部には出力され ていない場合を例にとって説明した力 本発明はこれに限定されず、例えば Lチヤネ ル復号信号 (L' ' )または Rチャネル復号信号 (R' ' )の生成に失敗した場合に、モノ ラル復号信号 (Μ' )をステレオ音声復号装置 200の外部に出力し、ステレオ音声復 号装置 200の復号音声として用いても良!/、。  In this embodiment, in FIGS. 7 and 8, the monaural decoded signal (Μ ′) generated by monaural signal decoding section 221 is not output to the outside of stereo audio decoding apparatus 200. The present invention is not limited to this. For example, when the generation of the L channel decoded signal (L ′ ′) or the R channel decoded signal (R ′ ′) fails, the monaural decoded signal ( Μ ') can be output to the outside of the stereo audio decoding device 200 and used as the decoded audio of the stereo audio decoding device 200! /.
[0078] また、本実施の形態では、ステレオ音声符号化装置 100のステレオ音声再構築部 104は、モノラル信号生成部 102から入力されるモノラル信号 (M)を Lチャネル信号 (L)および Rチャネル信号 (R)に対してそれぞれ用いた符号化を行うことで得られた Lチャネル適応フィルタパラメータおよび Rチャネル適応フィルタパラメータと、モノラ ル信号符号化部 103から入力されるモノラル信号符号化パラメータを用いて復号処 理を行うことで得られたモノラル復号信号 (Μ' )と、を用いて Lチャネル再構築信号( V )および Rチャネル再構築信号 (R' )を得る場合を例にとって説明した力 本発明 はこれに限定されず、ステレオ音声再構築部 104は、モノラル信号 (M)とモノラル信 号符号化パラメータとを用いずに、 Lチャネル信号 (L)および Rチャネル信号 (R)の それぞれに対して符号化処理および復号処理を行うことで、 Lチャネル再構築信号( L' )および Rチャネル再構築信号 (R' )を得ても良い。かかる場合、ステレオ音声符 号化装置においては、モノラル信号生成部 102およびモノラル信号符号化部 103を 備えなくても良い。また、かかる場合、 Lチャネル適応フィルタパラメータおよび Rチヤ ネル適応フィルタパラメータの代わりに Lチャネル符号化パラメータおよび Rチャネル 符号化パラメータが、ステレオ音声再構築部における Lチャネル信号 (L)および Rチ ャネル信号 (R)の符号化処理により生成される。このため、このステレオ音声符号化 装置から出力されるビットストリームには、モノラル信号符号化パラメータを含まなくて も良い。 Further, in the present embodiment, stereo speech reconstruction unit of stereo speech coding apparatus 100 104 is an L channel adaptive filter obtained by encoding the monaural signal (M) input from the monaural signal generation unit 102 with respect to the L channel signal (L) and the R channel signal (R). Parameter and R channel adaptive filter parameter, and the monaural decoded signal (Μ ′) obtained by performing decoding using the monaural signal encoding parameter input from the monaural signal encoding unit 103. The power described by taking the case of obtaining the L channel reconstructed signal (V) and the R channel reconstructed signal (R ′) as an example. The present invention is not limited to this, and the stereo sound reconstructing unit 104 is connected to the monaural signal (M). Without using monaural signal encoding parameters, the L channel signal (L) and the R channel signal (R) are encoded and decoded, respectively. ') And R channel reconstruction signal (R') may be obtained. In such a case, the stereo audio encoding device may not include the monaural signal generation unit 102 and the monaural signal encoding unit 103. In such a case, instead of the L channel adaptive filter parameter and the R channel adaptive filter parameter, the L channel coding parameter and the R channel coding parameter are replaced by the L channel signal (L) and the R channel signal in the stereo speech reconstruction unit. It is generated by the encoding process (R). For this reason, the bit stream output from this stereo speech coding apparatus may not include a monaural signal coding parameter.
[0079] そして、このようなステレオ音声符号化装置に対応するステレオ音声復号装置とし ては、図 7に示したステレオ音声復号装置 200において、モノラル信号符号化パラメ ータを用いない構成となる。すなわち、ビットストリームにモノラル信号符号化パラメ一 タが含まれない場合には、分離部 201からモノラル信号符号化パラメータが出力され ない。さらに、ステレオ音声復号部 202は、モノラル信号復号部 221を備えず、 Lチヤ ネル符号化パラメータおよび Rチャネル符号化パラメータに対して、対応するステレ ォ音声符号化装置のステレオ音声再構築部内で行われる復号処理と同様の復号処 理を行うことで、 Lチャネル再構築信号 (L' )および Rチャネル再構築信号 (R' )を得 ても良い。  [0079] Then, as a stereo speech decoding apparatus corresponding to such a stereo speech coding apparatus, the stereo speech decoding apparatus 200 shown in Fig. 7 does not use monaural signal coding parameters. That is, when the monaural signal encoding parameter is not included in the bit stream, the monaural signal encoding parameter is not output from the separation unit 201. Further, the stereo speech decoding unit 202 does not include the monaural signal decoding unit 221 and performs the processing within the stereo speech reconstruction unit of the corresponding stereo speech coding apparatus for the L channel coding parameter and the R channel coding parameter. The L channel reconstructed signal (L ′) and the R channel reconstructed signal (R ′) may be obtained by performing a decoding process similar to the above decoding process.
[0080] (実施の形態 2)  [0080] (Embodiment 2)
実施の形態 1では、復号側での Lチャネルおよび Rチャネルの復号信号の生成に おいて、 Lチャネル残響信号 (L' )および Rチャネル残響信号 (R' )を用いる構 In Embodiment 1, the decoding side generates L channel and R channel decoded signals. In this configuration, the L channel reverberation signal (L ') and the R channel reverberation signal (R') are used.
Rev Rev  Rev Rev
成について説明したが、本発明はこれに限定されず、 Lチャネル残響信号 (L' )お  However, the present invention is not limited to this, and the L channel reverberation signal (L ′) and the
Rev よび Rチャネル残響信号 (R' )の代わりに、モノラル残響信号を用いる構成としても  Instead of Rev and R channel reverberation signal (R '), a configuration using monaural reverberation signal can be used.
Rev  Rev
良い。実施の形態 2では、その場合の具体的な構成および動作について説明する。  good. In the second embodiment, a specific configuration and operation in that case will be described.
[0081] 本実施の形態に係るステレオ音声符号化装置の構成と動作は、図 3の相互相関比 較部 106の動作以外は、実施の形態 1と同様である。本実施の形態に係る相互相関 比較部 106では、式 (4)の代わりに式(13)により相互相関比較結果 αを求める。  [0081] The configuration and operation of the stereo speech coding apparatus according to the present embodiment are the same as those of Embodiment 1 except for the operation of cross-correlation comparing section 106 in FIG. Cross-correlation comparison section 106 according to the present embodiment obtains cross-correlation comparison result α using equation (13) instead of equation (4).
Figure imgf000020_0001
Figure imgf000020_0001
ただし、 ( , : Lチャネル信号と Rチャネル信号との相互相関係数  However, (,: Cross-correlation coefficient between L channel signal and R channel signal
C 2 : Lチャネル再構築信号と Rチャネル再構築信号との C 2: L-channel reconstructed signal and the R channel reconstructed signal
相互相関係数  Cross-correlation coefficient
a : 相互相関比較結果  a: Cross-correlation comparison result
[0082] 図 10は、本実施の形態に係るステレオ音声復号装置 300の主要な構成を示すブ ロック図である。ここで、分離部 201およびステレオ音声復号部 202の構成および動 作は、実施の形態 1において図 7に示したステレオ音声復号装置 200の分離部 201 およびステレオ音声復号部 202の構成および動作と同様であるため、説明を省略す o FIG. 10 is a block diagram showing the main configuration of stereo speech decoding apparatus 300 according to the present embodiment. Here, the configuration and operation of separation section 201 and stereo speech decoding section 202 are the same as the configuration and operation of separation section 201 and stereo speech decoding section 202 of stereo speech decoding apparatus 200 shown in FIG. Therefore, the explanation is omitted o
[0083] モノラル信号生成部 301は、ステレオ音声復号部 202から入力される Lチャネル再 構築信号 (L' )および Rチャネル再構築信号 (R' )を用いて、モノラル再構築信号 (M ' )を算出して出力する。モノラル再構築信号 (Μ' )は、図 3のモノラル信号生成部 10 2におけるモノラル信号 (Μ)の算法と同様に算出される。  The monaural signal generation unit 301 uses the L channel reconstructed signal (L ′) and the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202 to generate a monaural reconstructed signal (M ′). Is calculated and output. The monaural reconstructed signal (Μ ′) is calculated in the same manner as the monaural signal (Μ) in the monaural signal generation unit 102 in FIG.
[0084] モノラル信号オールパスフィルタ 302は、オールパスフィルタパラメータと、モノラル 信号生成部 301から入力されるモノラル再構築信号 (Μ' )とを用いてモノラル残響信 号 (Μ' )を生成し、 Lチャネル空間情報再現部 303および Rチャネル空間情報再 [0084] The monaural signal all-pass filter 302 generates a monaural reverberation signal (Μ ') using the all-pass filter parameter and the monaural reconstructed signal (Μ') input from the monaural signal generation unit 301, and outputs an L channel. Spatial information reproduction unit 303 and R channel spatial information
Rev Rev
現部 304に出力する。ここで、オールパスフィルタパラメータは、実施の形態 1におい て図 7に示した Lチャネルオールパスフィルタ 203および Rチャネルオールパスフィル タ 204と同様に、式(6)に示す伝達関数で表わされるものである。 Output to current part 304. Here, the all-pass filter parameters are the L-channel all-pass filter 203 and the R-channel all-pass filter shown in FIG. Similar to the data 204, it is represented by the transfer function shown in equation (6).
Lチャネル空間情報再現部 303は、分離部 201から入力される相互相関比較結果 a、ステレオ音声復号部 202から入力される Lチャネル再構築信号 (L')、およびモノ ラル信号オールパスフィルタ 302から入力されるモノラル残響信号 (M, )を用いて  The L channel spatial information reproduction unit 303 receives the cross correlation comparison result a input from the separation unit 201, the L channel reconstructed signal (L ′) input from the stereo speech decoding unit 202, and the monaural signal all-pass filter 302. Using the monaural reverberation signal (M,)
Rev Rev
、下記の式(14)に従って Lチャネル復号信号 (L' ')を算出し、出力する。 The L channel decoded signal (L ′ ′) is calculated and output according to the following equation (14).
[数 14]
Figure imgf000021_0001
同様に、 Rチャネル空間情報再現部 304は、分離部 201から入力される相互相関 比較結果 α、ステレオ音声復号部 202から入力される Rチャネル再構築信号 (R' )、 およびモノラル信号オールパスフィルタ 302から入力されるモノラル残響信号 (Μ,
[Equation 14]
Figure imgf000021_0001
Similarly, the R channel spatial information reproduction unit 304 receives the cross-correlation comparison result α input from the separation unit 201, the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202, and the monaural signal all-pass filter 302. Monaural reverberation signal (Μ,
Re Re
)を用いて、下記の式(15)に従って Rチャネル復号信号 (R' ')を算出し、出力する。 ) To calculate and output the R channel decoded signal (R ′ ′) according to the following equation (15).
[数 15] [Equation 15]
R" =aR' - , ( 1 5) R "= aR '-, (1 5)
Id ここで、 L'と M, とはほぼ直交しているとみなせるため、 Lチャネル復号信号(L' '  Id Here, since L ′ and M, can be regarded as almost orthogonal, the L channel decoded signal (L ′ ′
Rev  Rev
)のエネルギは、下記の式(16)で与えられる。同様に、 R'と M' がほぼ直交してい  ) Is given by the following equation (16). Similarly, R 'and M' are almost orthogonal
Rev  Rev
るとみなせるため、 Rチャネル復号信号 (R' ')のエネルギは、下記の式(17)となる。 Therefore, the energy of the R channel decoded signal (R ′ ′) is expressed by the following equation (17).
[数 16] [Equation 16]
Figure imgf000021_0002
Figure imgf000021_0002
R R ( 1 7) また、 L'と M' との直交性および R'と M' との直交性から、 Lチャネル復号信号 R R (1 7) Also, the L channel decoded signal is obtained from the orthogonality between L 'and M' and the orthogonality between R 'and M'.
Rev Rev  Rev Rev
(L' ')と Rチャネル復号信号 (R' ')との相互相関値 Cの分子項は式(18)で与えら  (L '') and R channel decoded signal (R '') cross-correlation value The numerator of C is given by equation (18).
3  Three
れる。従って、 Lチャネル復号信号 (L' ' )と Rチャネル復号信号 (R' ' )との相互相関 値 Cは式(13) , ( 16) , ( 17) , ( 18)から、式(19)に示すとおり、オリジナルの LチヤIt is. Therefore, the cross-correlation between the L channel decoded signal (L '') and the R channel decoded signal (R '') The value C is calculated from the formulas (13), (16), (17), (18) as shown in the formula (19).
3 Three
ネル信号 (L)と Rチャネル信号 (R)との相互相関係数 Cに等しくなる。以上から、 Lチ ャネル空間情報再現部 303および Rチャネル空間情報再現部 304は、式(14)およ び式(15)に従って相互相関比較結果 αを利用して復号信号を算出することで、 2チ ャネル間の相互相関値がオリジナルの相互相関値に等 1  Is equal to the cross-correlation coefficient C between the channel signal (L) and the R channel signal (R). From the above, the L channel spatial information reproduction unit 303 and the R channel spatial information reproduction unit 304 calculate the decoded signal using the cross-correlation comparison result α according to the equations (14) and (15), The cross-correlation value between two channels is the same as the original cross-correlation value 1
号信号を得ることができる。  Signal can be obtained.
[数 18]  [Equation 18]
Figure imgf000022_0001
Figure imgf000022_0001
[0089] このように、本実施の形態によれば、復号側での Lチャネルおよび Rチャネルの復 号信号の生成において、 Lチャネル残響信号 (L' )および Rチャネル残響信号 (R' As described above, according to the present embodiment, in the generation of the L channel and R channel decoded signals on the decoding side, the L channel reverberation signal (L ′) and the R channel reverberation signal (R ′)
Rev  Rev
)を用いる代わりに、モノラル残響信号 (Μ ' )を用いて、オリジナルのステレオ信 ) Instead of using a monaural reverberation signal (Μ ')
Rev Rev Rev Rev
号に含まれる空間情報を再現することができ、復号されたステレオ音声信号の空間ィ メージを向上することができる。  The spatial information contained in the signal can be reproduced, and the spatial image of the decoded stereo audio signal can be improved.
[0090] また、本実施の形態によれば、復号側で、 Lチャネルおよび Rチャネルの 2種類の 残響信号を生成する代わりに、モノラル信号に対する残響信号のみを生成すればよ いため、残響信号を生成するための演算量を削減することができる。  [0090] Also, according to the present embodiment, instead of generating two types of reverberation signals of the L channel and the R channel on the decoding side, it is only necessary to generate a reverberation signal for a monaural signal. The amount of calculation for generating can be reduced.
[0091] なお、本実施の形態では、モノラル信号生成部 301によりモノラル再構築信号 (Μ ' )を算出する場合を例にとって説明した力 S、本発明はこれに限定されず、ステレオ音 声復号部 202が、図 8に示すように、モノラル信号を復号するモノラル信号復号部を 有する構成をとる場合には、ステレオ音声復号部 202により直接モノラル再構築信号 (Μ ' )を得ても良い。  [0091] In the present embodiment, the force S described by taking as an example the case where the monaural reconstructed signal (Μ ') is calculated by the monaural signal generation unit 301, the present invention is not limited to this, and stereo audio decoding is performed. As shown in FIG. 8, when unit 202 has a monaural signal decoding unit that decodes a monaural signal, monaural reconstructed signal (Μ ′) may be obtained directly by stereo audio decoding unit 202.
[0092] 以上、本発明の実施の形態について説明した。  [0092] The embodiments of the present invention have been described above.
[0093] なお、上記各実施の形態では、左チャネルを Lチャネル、右チャネルを Rチャネルと して説明した力 左右の位置関係がこの表記により限定されないことは言うまでもない [0093] In the above embodiments, the left channel is the L channel and the right channel is the R channel. It goes without saying that the positional relationship between the left and right is not limited by this notation.
[0094] また、上記各実施の形態におけるステレオ音声復号装置は、上記各実施の形態に おけるステレオ音声符号化装置が送信したビットストリームを受信して処理を行うとし て説明したが、本発明はこれに限定されず、上記各実施の形態におけるステレオ音 声復号装置が受信し処理するビットストリームは、この復号装置で処理可能なビットス トリームを生成可能な符号化装置が送信したものであれば良い。 Furthermore, although the stereo speech decoding apparatus in each of the above embodiments has been described as receiving and processing the bitstream transmitted by the stereo speech coding apparatus in each of the above embodiments, the present invention is not limited to this. The bit stream received and processed by the stereo audio decoding device in each of the above embodiments is not limited to this, and may be any bit stream transmitted by an encoding device capable of generating a bit stream that can be processed by this decoding device. .
[0095] また、本発明に係るステレオ音声符号化装置、ステレオ音声復号装置は、移動体 通信システムにおける通信端末装置に搭載することが可能であり、これにより上記と 同様の作用効果を有する通信端末装置を提供することができる。  [0095] Further, the stereo speech coding apparatus and stereo speech decoding apparatus according to the present invention can be mounted on a communication terminal apparatus in a mobile communication system, and thereby a communication terminal having the same effects as described above. An apparatus can be provided.
[0096] また、ここでは、本発明をハードウェアで構成する場合を例にとって説明した力 本 発明をソフトウェアで実現することも可能である。例えば、本発明に係るステレオ音声 符号化方法/復号方法のアルゴリズムをプログラミング言語によって記述し、このプ ログラムをメモリに記憶しておいて情報処理手段によって実行させることにより、本発 明に係るステレオ音声符号化装置/復号装置と同様の機能を実現することができる  [0096] Further, here, the power described by taking the case where the present invention is configured by hardware as an example can be realized by software. For example, the stereo sound encoding method / decoding method algorithm according to the present invention is described in a programming language, and the program is stored in a memory and executed by an information processing means, whereby the stereo sound according to the present invention is recorded. A function similar to that of the encoding device / decoding device can be realized.
[0097] また、上記各実施の形態の説明に用いた各機能ブロックは、典型的には集積回路 である LSIとして実現される。これらは個別に 1チップ化されても良いし、一部または 全てを含むように 1チップ化されても良い。 Further, each functional block used in the description of each of the above embodiments is typically realized as an LSI that is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include some or all of them.
[0098] また、ここでは LSIとしたが、集積度の違いによって、 IC、システム LSI、スーパー L[0098] Although LSI is used here, depending on the degree of integration, IC, system LSI, super L
SI、ウノレ卜ラ LSI等と呼称されることもある。 Sometimes called SI, Unoraler LSI, etc.
[0099] また、集積回路化の手法は LSIに限るものではなぐ専用回路または汎用プロセッ サで実現しても良い。 LSI製造後に、プログラム化することが可能な FPGA (Field Pro grammable Gate Array)や、 LSI内部の回路セルの接続もしくは設定を再構成可能な リコンフィギユラブル .プロセッサを利用しても良!/、。 Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. You can use FPGA (Field Programmable Gate Array) that can be programmed after LSI manufacturing, or a reconfigurable processor that can reconfigure the connection or setting of circuit cells inside the LSI! / .
[0100] さらに、半導体技術の進歩または派生する別技術により、 LSIに置き換わる集積回 路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行って も良い。ノ ィォ技術の適用等が可能性としてあり得る。 [0101] 2006年 8月 4曰出願の特願 2006— 213634の曰本出願および 2007年 6月 14曰 出願の特願 2007— 157759の日本出願に含まれる明細書、図面および要約書の 開示内容は、すべて本願に援用される。 [0100] Further, if integrated circuit technology that replaces LSI emerges as a result of progress in semiconductor technology or other derived technology, it is naturally also possible to integrate functional blocks using this technology. There is a possibility of applying nanotechnology. [0101] August 2006 4) Japanese Patent Application 2006—213634 Application and June 2007 14 Japanese Patent Application 2007—157759 Japanese Application, Disclosure Contents of Drawings, Drawings and Abstracts Are all incorporated herein by reference.
産業上の利用可能性  Industrial applicability
[0102] 本発明に係るステレオ音声符号化装置、ステレオ音声復号装置、およびこれらの方 法は移動通信端末のステレオ音声符号化等の用途に適用することができる。 [0102] The stereo speech coding apparatus, stereo speech decoding apparatus, and these methods according to the present invention can be applied to uses such as stereo speech coding of mobile communication terminals.

Claims

請求の範囲 The scope of the claims
[1] ステレオ音声を構成する第 1チャネル信号と第 2チャネル信号との第 1相互相関係 数を算出する第 1算出手段と、  [1] First calculation means for calculating a first cross-phase relation number between the first channel signal and the second channel signal constituting stereo sound,
前記第 1チャネル信号および前記第 2チャネル信号を用いて第 1チャネル再構築 信号および第 2チャネル再構築信号を生成するステレオ音声再構築手段と、 前記第 1チャネル再構築信号と前記第 2チャネル再構築信号との第 2相互相関係 数を算出する第 2算出手段と、  Stereo audio reconstructing means for generating a first channel reconstructed signal and a second channel reconstructed signal using the first channel signal and the second channel signal, and the first channel reconstructed signal and the second channel reconstructed signal. A second calculating means for calculating a second correlation number with the construction signal;
前記第 1相互相関係数と前記第 2相互相関係数とを比較することにより、前記ステ レオ音声の空間情報を含む相互相関比較結果を得る比較手段と、  A comparison means for obtaining a cross-correlation comparison result including spatial information of the stereo speech by comparing the first cross-correlation coefficient and the second cross-correlation coefficient;
を具備するステレオ音声符号化装置。  A stereo speech coding apparatus comprising:
[2] 前記第 1算出手段は、式 (1) [2] The first calculating means is represented by the formula (1)
Figure imgf000025_0001
Country
Figure imgf000025_0001
ただし、 n : 時間軸上におけるサンプル番号  Where n is the sample number on the time axis
L (n) :第 1チャネル信号  L (n): First channel signal
R (n) : 第 2チャネル信号 R (n) : Second channel signal
C Ύ : 第 1チャネル信 と第 2チャネル信号との相互相関係数 に従って前記第 1相互相関係数を算出し、 C :: Calculate the first cross-correlation coefficient according to the cross-correlation coefficient between the first channel signal and the second channel signal,
前記第 2算出手段は、式 (2)  The second calculation means is represented by the equation (2)
[数 2]  [Equation 2]
2 ( '(")  2 ('(")
C2= , "' (C ^ … (2) ただし、 n : 時間軸上におけるサンプル番号 C 2 =, "'(C ^… (2) where n is the sample number on the time axis
L' (n) : 第 1チャネル再構築信号  L '(n): 1st channel reconstruction signal
R' (n) : 第 2チャネル再構築信号  R '(n): Second channel reconstructed signal
C2 : 第 1チャネル再構築信号と第 2チャネル再構築信号との C2 : The first channel reconstruction signal and the second channel reconstruction signal
相互相関係数 に従って前記第 2相互相関係数を算出し、 Cross-correlation coefficient To calculate the second cross-correlation coefficient according to
前記比較手段は、式(3)  The comparison means has the formula (3)
[数 3]
Figure imgf000026_0001
[Equation 3]
Figure imgf000026_0001
ただし、 : 第 1 チャネル信号と第 2チャネル信号との相互相関係数  Where: Cross-correlation coefficient between the first channel signal and the second channel signal
C 2 : 第 1 チャネル再構築信号と第 2チャネル再構築信号との C2 : The first channel reconstruction signal and the second channel reconstruction signal
相互相関係数  Cross-correlation coefficient
a : 相互相関比較結果 に従って前記相互相関比較結果を得る、  a: obtaining the cross-correlation comparison result according to the cross-correlation comparison result,
請求項 1記載のステレオ音声符号化装置。  The stereo speech coding apparatus according to claim 1.
[3] 前記第 1チャネル信号と前記第 2チャネル信号とを用いてモノラル信号を生成する モノラル信号生成手段と、 [3] a monaural signal generating means for generating a monaural signal using the first channel signal and the second channel signal;
前記モノラル信号を符号化することでモノラル信号符号化パラメータを生成するモ ノラル信号符号化手段と、  Monaural signal encoding means for generating a monaural signal encoding parameter by encoding the monaural signal;
を更に具備し、  Further comprising
前記ステレオ音声再構築手段は、  The stereo audio reconstruction means includes:
前記第 1チャネル信号および前記第 2チャネル信号のそれぞれに対して前記モノラ ル信号および前記モノラル信号符号化パラメータを用いることで第 1チャネル再構築 信号および第 2チャネル再構築信号を生成する、  Generating a first channel reconstructed signal and a second channel reconstructed signal by using the monaural signal and the monaural signal encoding parameter for each of the first channel signal and the second channel signal;
請求項 1記載のステレオ音声符号化装置。  The stereo speech coding apparatus according to claim 1.
[4] 前記ステレオ音声再構築手段は、 [4] The stereo sound reconstruction means includes:
前記モノラル信号と前記第 1チャネル信号との平均二乗誤差を最小化する第 1適応 フィルタパラメータを求める第 1適応フィルタと、  A first adaptive filter for obtaining a first adaptive filter parameter that minimizes a mean square error between the monaural signal and the first channel signal;
前記モノラル信号と前記第 2チャネル信号との平均二乗誤差を最小化する第 2適応 フィルタパラメータを求める第 2適応フィルタと、  A second adaptive filter for obtaining a second adaptive filter parameter that minimizes a mean square error between the monaural signal and the second channel signal;
前記モノラル信号符号化パラメータを用いて前記モノラル信号を復号することでモ ノラル復号信号を生成するモノラル信号復号手段と、 前記モノラル復号信号を前記第 1適応フィルタパラメータによってフィルタリングする ことで前記第 1チャネル再構築信号を生成する第 1合成フィルタと、 Monaural signal decoding means for generating a monaural decoded signal by decoding the monaural signal using the monaural signal encoding parameter; A first synthesis filter for generating the first channel reconstructed signal by filtering the monaural decoded signal with the first adaptive filter parameter;
前記モノラル復号信号を前記第 2適応フィルタパラメータによってフィルタリングする ことで前記第 2チャネル再構築信号を生成する第 2合成フィルタと、  A second synthesis filter for generating the second channel reconstructed signal by filtering the monaural decoded signal with the second adaptive filter parameter;
を具備する、  Comprising
請求項 3記載のステレオ音声符号化装置。  The stereo speech coding apparatus according to claim 3.
[5] 受信したビットストリームから、符号化装置において生成された、ステレオ音声を構 成する第 1チャネル信号および第 2チャネル信号それぞれに関する第 1パラメータお よび第 2パラメータと、前記第 1チャネル信号と前記第 2チャネル信号との第 1相互相 関と、前記第 1チャネル信号および前記第 2チャネル信号を用いて生成された第 1チ ャネル再構築信号と第 2チャネル再構築信号との第 2相互相関とを比較して得られた 、前記ステレオ音声に関する空間情報を含む相互相関比較結果と、を得る分離手段 と、 [5] The first parameter and the second parameter relating to the first channel signal and the second channel signal, respectively, constituting the stereo sound, generated from the received bit stream in the encoding device, and the first channel signal A first mutual correlation with the second channel signal, and a second mutual correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. Separation means for obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the correlation; and
前記第 1パラメータおよび前記第 2パラメータを用いて第 1チャネル再構築復号信 号および第 2チャネル再構築復号信号を生成するステレオ音声復号手段と、 前記第 1チャネル再構築復号信号を用いて第 1チャネル残響信号を生成するととも に、前記第 2チャネル再構築復号信号を用いて第 2チャネル残響信号を生成するス テレオ残響信号生成手段と、  Stereo audio decoding means for generating a first channel reconstructed decoded signal and a second channel reconstructed decoded signal using the first parameter and the second parameter, and a first using the first channel reconstructed decoded signal Stereo reverberation signal generating means for generating a channel reverberation signal and generating a second channel reverberation signal using the second channel reconstructed decoded signal;
前記第 1チャネル再構築復号信号と、前記第 1チャネル残響信号と、前記相互相関 比較結果とを用いて、第 1チャネル復号信号を生成する第 1空間情報再現手段と、 前記第 2チャネル再構築復号信号と、前記第 2チャネル残響信号と、前記相互相関 比較結果とを用いて、第 2チャネル復号信号を生成する第 2空間情報再現手段と、 を具備するステレオ音声復号装置。  First spatial information reproduction means for generating a first channel decoded signal using the first channel reconstructed decoded signal, the first channel reverberation signal, and the cross-correlation comparison result; and the second channel reconstruction A stereo speech decoding apparatus comprising: a second spatial information reproduction unit that generates a second channel decoded signal using a decoded signal, the second channel reverberation signal, and the cross-correlation comparison result.
[6] 前記ステレオ残響信号生成手段は、 [6] The stereo reverberation signal generating means includes:
前記第 1チャネル再構築復号信号をオールパスフィルタリングすることで前記第 1チ ャネル残響信号を生成する第 1オールパスフィルタと、  A first all-pass filter that generates the first channel reverberation signal by performing all-pass filtering on the first channel reconstructed decoded signal;
前記第 2チャネル再構築復号信号をオールパスフィルタリングすることで前記第 2チ ャネル残響信号を生成する第 2オールパスフィルタと、 を具備する請求項 5記載のステレオ音声復号装置。 A second all-pass filter that generates the second channel reverberation signal by performing all-pass filtering on the second channel reconstructed decoded signal; 6. The stereo speech decoding apparatus according to claim 5, further comprising:
[7] 受信したビットストリームから、符号化装置において生成された、ステレオ音声を構 成する第 1チャネル信号および第 2チャネル信号それぞれに関する第 1パラメータお よび第 2パラメータと、前記第 1チャネル信号と前記第 2チャネル信号との第 1相互相 関と前記第 1チャネル信号および前記第 2チャネル信号を用いて生成された第 1チヤ ネル再構築信号と第 2チャネル再構築信号との第 2相互相関とを比較して得られた、 前記ステレオ音声に関する空間情報を含む相互相関比較結果と、を得る分離手段と 前記第 1パラメータおよび前記第 2パラメータを用いて第 1チャネル再構築復号信 号および第 2チャネル再構築復号信号を生成するステレオ音声復号手段と、 前記第 1チャネル再構築復号信号および前記第 2チャネル再構築復号信号を用い てモノラル残響信号を生成するモノラル残響信号生成手段と、 [7] The first parameter and the second parameter relating to the first channel signal and the second channel signal, respectively, constituting the stereo sound, generated from the received bit stream in the encoding device, and the first channel signal A first cross-correlation with the second channel signal and a second cross-correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. And a cross-correlation comparison result including spatial information regarding the stereo sound obtained by comparing the first channel reconstructed decoded signal and the second channel using the first parameter and the second parameter. Stereo audio decoding means for generating a 2-channel reconstructed decoded signal, and using the first channel reconstructed decoded signal and the second channel reconstructed decoded signal Monaural reverberation signal generating means for generating a monaural reverberation signal;
前記第 1チャネル再構築復号信号と、前記モノラル残響信号と、前記相互相関比 較結果とを用いて、第 1チャネル復号信号を生成する第 1空間情報再現手段と、 前記第 2チャネル再構築復号信号と、前記モノラル残響信号と、前記相互相関比 較結果とを用いて、第 2チャネル復号信号を生成する第 2空間情報再現手段と、 を具備するステレオ音声復号装置。  First spatial information reproduction means for generating a first channel decoded signal using the first channel reconstructed decoded signal, the monaural reverberation signal, and the cross-correlation comparison result; and the second channel reconstructed decoding A stereo audio decoding device comprising: second spatial information reproduction means for generating a second channel decoded signal using a signal, the monaural reverberation signal, and the cross-correlation comparison result.
[8] 前記モノラル残響信号生成手段は、 [8] The monaural reverberation signal generating means includes:
前記第 1チャネル再構築復号信号と前記第 2チャネル再構築復号信号とを用いて モノラル再構築信号を生成するモノラル信号生成手段と、  Monaural signal generating means for generating a monaural reconstructed signal using the first channel reconstructed decoded signal and the second channel reconstructed decoded signal;
前記モノラル再構築信号をオールパスフィルタリングすることで前記モノラル残響信 号を生成するモノラル信号オールパスフィルタと、  A monaural signal all-pass filter that generates the monaural reverberation signal by all-pass filtering the monaural reconstructed signal;
を具備する請求項 7記載のステレオ音声復号装置。  The stereo speech decoding apparatus according to claim 7, further comprising:
[9] ステレオ音声を構成する第 1チャネル信号と第 2チャネル信号との第 1相互相関係 数を算出するステップと、 [9] calculating a first correlation number between the first channel signal and the second channel signal constituting the stereo sound;
前記第 1チャネル信号および前記第 2チャネル信号を用いて第 1チャネル再構築 信号および第 2チャネル再構築信号を生成するステップと、  Generating a first channel reconstructed signal and a second channel reconstructed signal using the first channel signal and the second channel signal;
前記第 1チャネル再構築信号と前記第 2チャネル再構築信号との第 2相互相関係 数を算出するステップと、 Second mutual phase relationship between the first channel reconstructed signal and the second channel reconstructed signal Calculating a number;
前記第 1相互相関係数と前記第 2相互相関係数とを比較することにより、前記ステ レオ音声の空間情報を含む相互相関比較結果を得るステップと、  Obtaining a cross-correlation comparison result including spatial information of the stereo speech by comparing the first cross-correlation coefficient and the second cross-correlation coefficient;
を具備するステレオ音声符号化方法。  Stereo audio encoding method comprising:
[10] 受信したビットストリームから、符号化装置において生成された、ステレオ音声を構 成する第 1チャネル信号および第 2チャネル信号それぞれに関する第 1パラメータお よび第 2パラメータと、前記第 1チャネル信号と前記第 2チャネル信号との第 1相互相 関と前記第 1チャネル信号および前記第 2チャネル信号を用いて生成された第 1チヤ ネル再構築信号と第 2チャネル再構築信号との第 2相互相関とを比較して得られた、 前記ステレオ音声に関する空間情報を含む相互相関比較結果とを得るステップと、 前記第 1パラメータおよび前記第 2パラメータを用いて第 1チャネル再構築復号信 号および第 2チャネル再構築復号信号を生成するステップと、 [10] A first parameter and a second parameter relating to the first channel signal and the second channel signal, respectively, constituting stereo sound, generated from the received bit stream in the encoding device, and the first channel signal A first cross-correlation with the second channel signal and a second cross-correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. Obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the first channel reconstructed decoded signal and the second channel using the first parameter and the second parameter. Generating a channel reconstructed decoded signal;
前記第 1チャネル再構築復号信号を用いて第 1チャネル残響信号を生成するととも に、前記第 2チャネル再構築復号信号を用いて第 2チャネル残響信号を生成するス 前記第 1チャネル再構築復号信号と、前記第 1チャネル残響信号と、前記相互相関 比較結果とを用いて、第 1チャネル復号信号を生成するステップと、  The first channel reconstructed decoded signal is used to generate a first channel reverberant signal using the first channel reconstructed decoded signal and to generate a second channel reverberant signal using the second channel reconstructed decoded signal. Generating a first channel decoded signal using the first channel reverberation signal and the cross-correlation comparison result;
前記第 2チャネル再構築復号信号と、前記第 2チャネル残響信号と、前記相互相関 比較結果とを用いて、第 2チャネル復号信号を生成するステップと、  Generating a second channel decoded signal using the second channel reconstructed decoded signal, the second channel reverberation signal, and the cross-correlation comparison result;
を具備するステレオ音声復号方法。  Stereo audio decoding method comprising:
[11] 受信したビットストリームから、符号化装置において生成された、ステレオ音声を構 成する第 1チャネル信号および第 2チャネル信号それぞれに関する第 1パラメータお よび第 2パラメータと、前記第 1チャネル信号と前記第 2チャネル信号との第 1相互相 関と前記第 1チャネル信号および前記第 2チャネル信号を用いて生成された第 1チヤ ネル再構築信号と第 2チャネル再構築信号との第 2相互相関とを比較して得られた、 前記ステレオ音声に関する空間情報を含む相互相関比較結果とを得るステップと、 前記第 1パラメータおよび前記第 2パラメータを用いて第 1チャネル再構築復号信 号および第 2チャネル再構築復号信号を生成するステップと、 前記第 1チャネル再構築復号信号および前記第 2チャネル再構築復号信号を用い てモノラル残響信号を生成するステップと、 [11] The first and second parameters relating to the first channel signal and the second channel signal, which are generated in the encoding device from the received bit stream and constitute stereo audio, respectively, and the first channel signal A first cross-correlation with the second channel signal and a second cross-correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. Obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the first channel reconstructed decoded signal and the second channel using the first parameter and the second parameter. Generating a channel reconstructed decoded signal; Generating a monaural reverberation signal using the first channel reconstructed decoded signal and the second channel reconstructed decoded signal;
前記第 1チャネル再構築復号信号と、前記モノラル残響信号と、前記相互相関比 較結果とを用いて、第 1チャネル復号信号を生成するステップと、  Generating a first channel decoded signal using the first channel reconstructed decoded signal, the monaural reverberation signal, and the cross-correlation comparison result;
前記第 2チャネル再構築復号信号と、前記モノラル残響信号と、前記相互相関比 較結果とを用いて、第 2チャネル復号信号を生成するステップと、  Generating a second channel decoded signal using the second channel reconstructed decoded signal, the monaural reverberation signal, and the cross-correlation comparison result;
を具備するステレオ音声復号方法。  Stereo audio decoding method comprising:
PCT/JP2007/065132 2006-08-04 2007-08-02 Stereo audio encoding device, stereo audio decoding device, and method thereof WO2008016097A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US12/376,000 US8150702B2 (en) 2006-08-04 2007-08-02 Stereo audio encoding device, stereo audio decoding device, and method thereof
EP07791812.6A EP2048658B1 (en) 2006-08-04 2007-08-02 Stereo audio encoding device, stereo audio decoding device, and method thereof
JP2008527782A JP4999846B2 (en) 2006-08-04 2007-08-02 Stereo speech coding apparatus, stereo speech decoding apparatus, and methods thereof

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2006213634 2006-08-04
JP2006-213634 2006-08-04
JP2007-157759 2007-06-14
JP2007157759 2007-06-14

Publications (1)

Publication Number Publication Date
WO2008016097A1 true WO2008016097A1 (en) 2008-02-07

Family

ID=38997271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/065132 WO2008016097A1 (en) 2006-08-04 2007-08-02 Stereo audio encoding device, stereo audio decoding device, and method thereof

Country Status (4)

Country Link
US (1) US8150702B2 (en)
EP (1) EP2048658B1 (en)
JP (1) JP4999846B2 (en)
WO (1) WO2008016097A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009122757A1 (en) * 2008-04-04 2009-10-08 パナソニック株式会社 Stereo signal converter, stereo signal reverse converter, and methods for both
WO2010099752A1 (en) * 2009-03-04 2010-09-10 华为技术有限公司 Stereo coding method, device and encoder
WO2010108445A1 (en) * 2009-03-25 2010-09-30 华为技术有限公司 Method for estimating inter-channel delay and apparatus and encoder thereof

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2008132826A1 (en) * 2007-04-20 2010-07-22 パナソニック株式会社 Stereo speech coding apparatus and stereo speech coding method
WO2008132850A1 (en) * 2007-04-25 2008-11-06 Panasonic Corporation Stereo audio encoding device, stereo audio decoding device, and their method
JP5340261B2 (en) * 2008-03-19 2013-11-13 パナソニック株式会社 Stereo signal encoding apparatus, stereo signal decoding apparatus, and methods thereof
CN101556799B (en) * 2009-05-14 2013-08-28 华为技术有限公司 Audio decoding method and audio decoder
JP5333257B2 (en) * 2010-01-20 2013-11-06 富士通株式会社 Encoding apparatus, encoding system, and encoding method
TWI516138B (en) 2010-08-24 2016-01-01 杜比國際公司 System and method of determining a parametric stereo parameter from a two-channel audio signal and computer program product thereof
JP5533502B2 (en) * 2010-09-28 2014-06-25 富士通株式会社 Audio encoding apparatus, audio encoding method, and audio encoding computer program
US9183842B2 (en) * 2011-11-08 2015-11-10 Vixs Systems Inc. Transcoder with dynamic audio channel changing
JP5949270B2 (en) * 2012-07-24 2016-07-06 富士通株式会社 Audio decoding apparatus, audio decoding method, and audio decoding computer program
EP4327324A1 (en) * 2021-07-08 2024-02-28 Boomcloud 360, Inc. Colorless generation of elevation perceptual cues using all-pass filter networks

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1132399A (en) * 1997-05-13 1999-02-02 Sony Corp Coding method and system and recording medium
JP2002244698A (en) * 2000-12-14 2002-08-30 Sony Corp Device and method for encoding, device and method for decoding, and recording medium
JP2002344325A (en) * 2001-05-18 2002-11-29 Sony Corp Coding apparatus and method and recording medium
JP2004325633A (en) * 2003-04-23 2004-11-18 Matsushita Electric Ind Co Ltd Method and program for encoding signal, and recording medium therefor
JP2005202248A (en) * 2004-01-16 2005-07-28 Fujitsu Ltd Audio encoding device and frame region allocating circuit of audio encoding device
JP2005523480A (en) * 2002-04-22 2005-08-04 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Spatial audio parameter display
WO2006070757A1 (en) * 2004-12-28 2006-07-06 Matsushita Electric Industrial Co., Ltd. Audio encoding device and audio encoding method
JP2006213634A (en) 2005-02-03 2006-08-17 Mitsubishi Gas Chem Co Inc Phenanthrene quinone derivative and method for producing the same
JP2007157759A (en) 2005-11-30 2007-06-21 Fujitsu Ltd Piezoelectric element and its manufacturing method

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6356211B1 (en) * 1997-05-13 2002-03-12 Sony Corporation Encoding method and apparatus and recording medium
DE19742655C2 (en) * 1997-09-26 1999-08-05 Fraunhofer Ges Forschung Method and device for coding a discrete-time stereo signal
US6614365B2 (en) * 2000-12-14 2003-09-02 Sony Corporation Coding device and method, decoding device and method, and recording medium
US8209168B2 (en) * 2004-06-02 2012-06-26 Panasonic Corporation Stereo decoder that conceals a lost frame in one channel using data from another channel
US7756713B2 (en) * 2004-07-02 2010-07-13 Panasonic Corporation Audio signal decoding device which decodes a downmix channel signal and audio signal encoding device which encodes audio channel signals together with spatial audio information
CN101006495A (en) * 2004-08-31 2007-07-25 松下电器产业株式会社 Audio encoding apparatus, audio decoding apparatus, communication apparatus and audio encoding method
BRPI0515551A (en) * 2004-09-17 2008-07-29 Matsushita Electric Ind Co Ltd audio coding apparatus, audio decoding apparatus, communication apparatus and audio coding method
SE0402652D0 (en) * 2004-11-02 2004-11-02 Coding Tech Ab Methods for improved performance of prediction based multi-channel reconstruction
BRPI0516376A (en) 2004-12-27 2008-09-02 Matsushita Electric Ind Co Ltd sound coding device and sound coding method
JP4850827B2 (en) * 2005-04-28 2012-01-11 パナソニック株式会社 Speech coding apparatus and speech coding method
WO2006121101A1 (en) * 2005-05-13 2006-11-16 Matsushita Electric Industrial Co., Ltd. Audio encoding apparatus and spectrum modifying method
WO2007088853A1 (en) * 2006-01-31 2007-08-09 Matsushita Electric Industrial Co., Ltd. Audio encoding device, audio decoding device, audio encoding system, audio encoding method, and audio decoding method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1132399A (en) * 1997-05-13 1999-02-02 Sony Corp Coding method and system and recording medium
JP2002244698A (en) * 2000-12-14 2002-08-30 Sony Corp Device and method for encoding, device and method for decoding, and recording medium
JP2002344325A (en) * 2001-05-18 2002-11-29 Sony Corp Coding apparatus and method and recording medium
JP2005523480A (en) * 2002-04-22 2005-08-04 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Spatial audio parameter display
JP2004325633A (en) * 2003-04-23 2004-11-18 Matsushita Electric Ind Co Ltd Method and program for encoding signal, and recording medium therefor
JP2005202248A (en) * 2004-01-16 2005-07-28 Fujitsu Ltd Audio encoding device and frame region allocating circuit of audio encoding device
WO2006070757A1 (en) * 2004-12-28 2006-07-06 Matsushita Electric Industrial Co., Ltd. Audio encoding device and audio encoding method
JP2006213634A (en) 2005-02-03 2006-08-17 Mitsubishi Gas Chem Co Inc Phenanthrene quinone derivative and method for producing the same
JP2007157759A (en) 2005-11-30 2007-06-21 Fujitsu Ltd Piezoelectric element and its manufacturing method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009122757A1 (en) * 2008-04-04 2009-10-08 パナソニック株式会社 Stereo signal converter, stereo signal reverse converter, and methods for both
WO2010099752A1 (en) * 2009-03-04 2010-09-10 华为技术有限公司 Stereo coding method, device and encoder
US9064488B2 (en) 2009-03-04 2015-06-23 Huawei Technologies Co., Ltd. Stereo encoding method, stereo encoding device, and encoder
WO2010108445A1 (en) * 2009-03-25 2010-09-30 华为技术有限公司 Method for estimating inter-channel delay and apparatus and encoder thereof
CN101848412B (en) * 2009-03-25 2012-03-21 华为技术有限公司 Method and device for estimating interchannel delay and encoder
US8417473B2 (en) 2009-03-25 2013-04-09 Huawei Technologies Co., Ltd. Method for estimating inter-channel delay and apparatus and encoder thereof

Also Published As

Publication number Publication date
US20090299734A1 (en) 2009-12-03
EP2048658B1 (en) 2013-10-09
EP2048658A1 (en) 2009-04-15
JPWO2008016097A1 (en) 2009-12-24
EP2048658A4 (en) 2012-07-11
JP4999846B2 (en) 2012-08-15
US8150702B2 (en) 2012-04-03

Similar Documents

Publication Publication Date Title
JP4999846B2 (en) Stereo speech coding apparatus, stereo speech decoding apparatus, and methods thereof
JP4875142B2 (en) Method and apparatus for a decoder for multi-channel surround sound
TWI387351B (en) Encoder, decoder and the related methods thereof
US7805313B2 (en) Frequency-based coding of channels in parametric multi-channel coding systems
JP4939933B2 (en) Audio signal encoding apparatus and audio signal decoding apparatus
JP5455647B2 (en) Audio decoder
TWI424756B (en) Binaural rendering of a multi-channel audio signal
JP5243527B2 (en) Acoustic encoding apparatus, acoustic decoding apparatus, acoustic encoding / decoding apparatus, and conference system
TWI490853B (en) Multi-channel audio processing
JP5227946B2 (en) Filter adaptive frequency resolution
JP2011008258A (en) High quality multi-channel audio encoding apparatus and decoding apparatus
EP1969901A2 (en) Personalized decoding of multi-channel surround sound
EP2427881A1 (en) Multi channel audio processing
JP7311601B2 (en) Apparatus, method and computer program for encoding, decoding, scene processing and other procedures for DirAC-based spatial audio coding with direct component compensation
WO2010125228A1 (en) Encoding of multiview audio signals
GB2580899A (en) Audio representation and associated rendering
CN102027535A (en) Processing of signals
KR20110002086A (en) An apparatus
KR100636145B1 (en) Exednded high resolution audio signal encoder and decoder thereof
WO2008016098A1 (en) Stereo audio encoding device, stereo audio decoding device, and method thereof
WO2009122757A1 (en) Stereo signal converter, stereo signal reverse converter, and methods for both
JPWO2008132826A1 (en) Stereo speech coding apparatus and stereo speech coding method
WO2007010844A1 (en) Relay device, communication terminal, signal decoder, signal processing method, and signal processing program
JPWO2008090970A1 (en) Stereo encoding apparatus, stereo decoding apparatus, and methods thereof
WO2010134332A1 (en) Encoding device, decoding device, and methods therefor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07791812

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2008527782

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2007791812

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

WWE Wipo information: entry into national phase

Ref document number: 12376000

Country of ref document: US