WO2008016097A1

WO2008016097A1 - Stereo audio encoding device, stereo audio decoding device, and method thereof

Info

Publication number: WO2008016097A1
Application number: PCT/JP2007/065132
Authority: WO
Inventors: Jiong Zhou; Kok Seng Chong
Original assignee: Panasonic Corporation
Priority date: 2006-08-04
Filing date: 2007-08-02
Publication date: 2008-02-07
Also published as: US20090299734A1; EP2048658B1; EP2048658A1; JPWO2008016097A1; EP2048658A4; JP4999846B2; US8150702B2

Abstract

Disclosed is a stereo audio encoding device capable of improving a spatial image of a decoded audio in stereo audio encoding. In this device, an original cross correlation calculation unit (101) calculates a mutual relationship coefficient (C1) between the original L channel signal and the original R channel signal. A stereo audio reconfiguration unit (104) subjects the inputted L channel signal and the R channel signal to encoding and decoding so as to generate an L channel reconfigured signal (L') and an R channel reconfigured signal (R'). A reconfiguration cross correlation calculation unit (105) calculates a cross correlationcoefficient (C2) between the L channel reconfigured signal (L') and the R channel reconfigured signal (R'). A cross correlation comparison unit (106) calculates and outputs a comparison result α between the cross correlation coefficient (C1) and the cross correlation coefficient (C2).

Description

Specification

Technical field of stereo speech coding apparatus, stereo speech decoding apparatus, and methods thereof

[0001] The present invention relates to a stereo speech coding apparatus used for encoding / decoding a stereo speech signal in a mobile communication system or a packet communication system using an Internet protocol (IP), The present invention relates to a stereo speech decoding apparatus and a method thereof.

Background art

[0002] In mobile communication systems or packet communication systems using IP, high bit rate transmission has become possible due to improved digital signal processing speed of DSP (Digital Signal Processor) and wider bandwidth. . If the transmission rate is further increased, it will be possible to secure a band that can transmit multiple channels (broadband), so stereo communication (stereo communication) will become widespread even in the case of monaural audio communication. It is expected. Broadband stereo communication can encode information about a more natural sound environment, and when played through headphones or speakers, creates a spatial image perceived by the listener.

[0003] As a technique for encoding spatial information included in a stereo audio signal, binaural cue coding (BCC) can be cited. For binaural cu coding, the encoding side encodes the monaural signal generated by combining the signals of the multiple channels that make up the stereo audio signal, and queues between the channel signals (inter-channel cues). ) Is calculated and encoded. Inter-channel cues are sub-information used to predict channel signals from monaural signals. Inter-channel level difference (ILD), Nya; inter-noisy time; ^ (ITD: Inter— channel time difference; and Inter-Channel Correlation (ICC) etc. The decoding side decodes the monaural signal encoding parameters to obtain the monaural decoded signal, and the monaural decoded signal reverberation signal. The stereo audio signal is reconstructed using the monaural decoded signal, its reverberation signal, and the inter-channel queue. As described above, Non-Patent Document 1 and Non-Patent Document 2 can be cited as disclosure examples of a technique for encoding spatial information included in a stereo audio signal. FIG. 1 is a block diagram showing the main configuration of stereo audio encoding apparatus 10 disclosed in Non-Patent Document 1. In FIG. 1, a monaural signal generation unit 11 generates a monaural signal (M) using an L channel signal and an R channel signal that constitute an input stereo audio signal, and outputs the monaural signal (M) to the monaural signal encoding unit 12. . The monaural signal encoding unit 12 encodes the monaural signal generated by the monaural signal generation unit 11 to generate a monaural signal encoding parameter, and outputs it to the multiplexing unit 14. The inter-channel queue calculation unit 13 calculates an inter-channel queue including ILD, ITD, ICC, and the like of the input L channel signal and R channel signal, and outputs them to the multiplexing unit 14. The multiplexing unit 14 multiplexes the monaural signal encoding parameter input from the monaural signal encoding unit 12 and the inter-channel queue input from the inter-channel queue calculation unit 13, and the obtained bit stream is a stereo audio decoding device. Send to 20.

FIG. 2 is a block diagram showing the main configuration of stereo audio decoding apparatus 20 disclosed in Non-Patent Document 1. In FIG. 2, the separation unit 21 performs separation processing on the bitstream transmitted from the stereo audio encoding device 10, outputs the obtained monaural signal coding parameters to the monaural signal decoding unit 22, and obtains the obtained channel. The inter-queue is output to the first queue combining unit 24 and the second queue combining unit 25. The monaural signal decoding unit 22 performs a decoding process using the monaural signal encoding parameters input from the separation unit 21, and converts the obtained monaural decoded signal into an all-pass filter 23, a first queue synthesis unit 24, and a second queue synthesis. Output to part 25. The all-pass filter 23 delays the input monaural decoded signal for a predetermined time from the monaural signal decoding unit 22 and outputs the generated monaural reverberation signal (M ′) to the first cue synthesizing unit 24 and the second cue synthesizing unit 25. Output to. 1st queue

Rev

The synthesizing unit 24 performs a decoding process using the inter-channel queue input from the demultiplexing unit 21, the monaural decoded signal input from the monaural signal decoding unit 22, and the monaural reverberation signal input from the all-pass filter 23. The obtained L channel decoded signal (L ') is output. The second cue synthesis unit 25 receives the inter-channel queue input from the separation unit 21, the monaural decoded signal input from the monaural signal decoding unit 22, and the all-pass filter 23. Decoding is performed using the monaural reverberation signal, and the resulting R channel decoded signal (R ′) is output.

[0006] Here, the conventional mobile phone can already be equipped with a multimedia player having a stereo function and an FM radio function. Furthermore, it is expected that functions such as recording and playback of stereo audio signals will be added to 4th generation mobile phones and IP phones.

Non-patent literature l: ISO / IEC 14496-3: 2005 Part3 Audio, 8.6.4 Parametric stereo

Non-Patent Document 2: ISO / IEC 23003-1: 2006 / FCD MPEG Surround (ISO / IEC 23003-1: 20

07Partl MPEG Surround)

Disclosure of the invention

Problems to be solved by the invention

[0007] However, in stereo audio signal encoding, ILD, ITD, and ICC are calculated and encoded as three inter-channel cues, whereas in stereo audio encoding, ILD and ITD are encoded. Encode only two inter-channel queues. Since I CC is important spatial information contained in the stereo audio signal, the stereo audio generated without using V and ICC on the decoding side lacks a spatial image! /. Therefore, in order to improve the spatial image of the stereo decoded signal, it is necessary to add a configuration for encoding spatial information in addition to ILD and ITD to stereo speech coding.

An object of the present invention is to provide a stereo speech coding apparatus, a stereo speech decoding apparatus, and a method thereof that can improve a spatial image of decoded speech in stereo speech coding.

Means for solving the problem

[0009] The stereo speech coding apparatus according to the present invention includes a first calculation means for calculating a first cross-correlation coefficient between a first channel signal and a second channel signal constituting stereo speech, and the first channel signal. Stereo audio reconstructing means for generating a first channel reconstructed signal and a second channel reconstructed signal using the second channel signal, and the first channel reconstructed signal and the second channel reconstructed signal By comparing the first cross-correlation coefficient with the second cross-correlation coefficient, second calculation means for calculating a second cross-correlation coefficient, the step is performed. Comparing means for obtaining a cross-correlation comparison result including spatial information of Leo speech is adopted.

[0010] Further, the stereo speech decoding apparatus of the present invention includes a first parameter and a second channel signal, which are generated by the encoding device from the received bit stream and each of the first channel signal and the second channel signal constituting the stereo sound. Two parameters, a first cross-correlation between the first channel signal and the second channel signal, a first channel reconstructed signal and a second channel generated using the first channel signal and the second channel signal. Separation means for obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the second cross-correlation with the reconstructed signal, the first parameter, and the second parameter. Stereo audio decoding means for generating a first channel reconstructed decoded signal and a second channel reconstructed decoded signal using the first channel reconstructed decoded signal And a stereo reverberation signal generating means for generating a second channel reverberation signal using the second channel reconstructed decoded signal, the first channel reconstructed decoded signal, and Using the first channel reverberation signal and the cross-correlation comparison result, first spatial information reproduction means for generating a first channel decoded signal, the second channel reconstructed decoded signal, and the second channel reverberation signal The second spatial information reproducing means for generating a second channel decoded signal using the cross-correlation comparison result is employed.

[0011] According to the present invention, in stereo audio signal encoding, two cross-correlation coefficients are compared as spatial information related to inter-channel cross-correlation (ICC), and the comparison result is transmitted to the stereo decoding side. In addition, the spatial image of the decoded stereo audio signal can be improved.

Brief Description of Drawings

FIG. 1 is a block diagram showing the main configuration of a stereo audio encoding device according to the prior art. FIG. 2 is a block diagram showing the main configuration of a stereo audio decoding device according to the prior art. FIG. 3 is a block diagram showing the main configuration of the stereo speech coding apparatus according to Embodiment 1 of the present invention. FIG. 4 is a block diagram showing a main configuration inside a stereo speech reconstruction unit according to Embodiment 1 of the present invention.

FIG. 5 is a diagram for illustrating the configuration and operation of an adaptive filter according to Embodiment 1 of the present invention. FIG. 6 is a procedure of stereo speech coding processing in the stereo speech coding apparatus according to Embodiment 1 of the present invention. Flow diagram showing an example

FIG. 7 is a block diagram showing the main configuration of the stereo speech decoding apparatus according to Embodiment 1 of the present invention.

FIG. 8 is a block diagram showing a main configuration inside a stereo speech decoding unit according to Embodiment 1 of the present invention.

FIG. 9 is a flowchart showing an example of a procedure of stereo audio decoding processing in the stereo audio decoding device according to Embodiment 1 of the present invention.

FIG. 10 is a block diagram showing the main configuration of a stereo speech decoding apparatus according to Embodiment 2 of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

In each embodiment, a case where a stereo audio signal is composed of a left (U channel and right (R) channel will be described as an example. The stereo audio encoding device according to each embodiment is input. The cross-correlation coefficient C between the original L-channel signal and the R-channel signal is calculated, and the stereo speech coding apparatus according to each embodiment includes a local stereo speech reconstructing unit. The channel signal is reconstructed, and the cross-correlation coefficient C between the reconstructed L channel signal and R channel signal is calculated.

2

The stereo speech coding apparatus according to the state compares the cross-correlation coefficient C with the cross-correlation coefficient C.

1 2 Compare and send the comparison result α to the stereo audio decoding device as spatial information included in the stereo audio signal.

[0015] (Embodiment 1)

FIG. 3 is a block diagram showing the main configuration of stereo speech coding apparatus 100 according to Embodiment 1 of the present invention. The stereo speech coding apparatus 100 performs stereo speech coding processing using the input L-channel signal and R-channel signal of the stereo signal. The transmitted bit stream is transmitted to a stereo audio decoding device 200 described later. Note that the stereo speech decoding apparatus 200 corresponding to the stereo speech coding apparatus 100 outputs a decoded signal of either a monaural signal or a stereo signal, thereby realizing monaural / stereo scalable coding.

The original cross-correlation calculation unit 101 calculates a cross-correlation coefficient C between the original L channel signal (L) and the R channel signal (R) constituting the stereo audio signal according to the following equation (1), and the cross correlation coefficient C is calculated. Output to correlation comparison section 106.

n: Sample number on the time axis

L (n): L channel signal

R (n): R channel signal

C! : Cross-correlation coefficient between L channel signal and R channel signal The monaural signal generation unit 102 uses the L channel signal (L) and the R channel signal (R), for example, according to the following equation (2), M) is generated, and the generated monaural signal (M) is output to the monaural signal encoding unit 103 and stereo audio reconstruction unit 104.

[Equation 2]

M (n) = ^ [L (n) + R (n)] ■■■ (2) where n is the sample number on the time axis

L (n): L channel signal

(n): R channel signal

M (n): Monaural signal The monaural signal encoding unit 103 performs an audio encoding process such as AMR—WB (Adaptive MultiRate-WideBand) on the monaural signal input from the monaural signal generation unit 102, and obtains it. The monaural signal encoding parameters to be output to the stereo speech reconstruction unit 104 and the multiplexing unit 107. Stereo audio reconstructing section 104 encodes L channel signal (L) and R channel signal (R) using monaural signal (M) input from monaural signal generating section 102. The obtained L channel adaptive filter parameters and R channel adaptive filter parameters are output to multiplexing section 107. Stereo audio reconstructing section 104 performs decoding processing using the obtained L channel adaptive filter parameters, R channel adaptive filter parameters, and monaural signal encoding parameters input from monaural signal encoding section 103, and The obtained L channel reconstructed signal (L ′) and R channel reconstructed signal (R ′) are output to the reconstructed cross-correlation calculating unit 105. Details of the stereo audio reconstruction unit 104 will be described later.

[0020] The reconstructed cross-correlation calculating unit 105 performs a cross-correlation coefficient C between the L channel reconstructed signal (L ') input from the stereo speech reconstructing unit 104 and the R channel reconstructed signal (R'). Below

Calculated according to Equation (3) of 2 and output to the cross-correlation comparator 106.

Where n is the sample number on the time axis

L '(n): L channel reconstruction signal

R '(n): R channel reconstruction signal

C _2: L-channel reconstructed signal and the R channel reconstructed signal

Cross-correlation coefficient

[0021] The cross-correlation comparison unit 106 uses the cross-correlation coefficient C input from the original cross-correlation calculation unit 101 and the cross-correlation coefficient C input from the reconstructed cross-correlation calculation unit 105 as follows:

1 2

Comparison is made according to equation (4), and the cross correlation comparison result α is output to the multiplexing unit 107.

[Equation 4]

Where: Cross-correlation coefficient between L channel signal and R channel signal

C _2: L-channel reconstructed signal and the R channel reconstructed signal

Cross-correlation coefficient

a: Cross-correlation comparison result

[0022] The cross-correlation value C between the reconstructed stereo signals is usually the original stereo signal.

2

The cross-correlation value between is greater than C. In such cases, C is greater than C

Since 1 2 1 I a I ≤1 is satisfied, it is suitable for quantizing / transmitting the parameters.

Multiplexer 107 includes a monaural signal encoding parameter input from monaural signal encoding unit 103, an L channel adaptive filter parameter, an R channel adaptive filter parameter, and a cross-correlation input from stereo audio reconstruction unit 104. The cross correlation comparison result α input from the comparison unit 106 is multiplexed, and the obtained bit stream is transmitted to the stereo speech decoding apparatus 200.

FIG. 4 is a block diagram showing a main configuration inside stereo audio reconstructing section 104.

[0025] The L channel adaptive filter 141 includes an adaptive filter, and uses the L channel signal (U and the monaural signal (Μ) input from the monaural signal generation unit 102 as a reference signal and an input signal, respectively. An adaptive filter parameter that minimizes the mean square error between the signal and the input signal is obtained and output to the L channel synthesis filter 144 and the multiplexing unit 107. Hereinafter, the adaptive filter parameter obtained by the L channel adaptive filter 141 is obtained. Is called the L channel adaptive filter parameter.

[0026] The R channel adaptive filter 142 includes an adaptive filter, and uses the R channel signal (R) and the monaural signal (Μ) input from the monaural signal generation unit 102 as a reference signal and an input signal, respectively. An adaptive filter parameter that minimizes the mean square error between the reference signal and the input signal is obtained and output to the R channel synthesis filter 145 and the multiplexing unit 107. Hereinafter, the adaptive filter parameters required in the R channel adaptive filter 142 are referred to as R channel adaptive filter parameters.

[0027] The monaural signal decoding unit 143 receives the monaural signal input from the monaural signal encoding unit 103. Speech decoding processing such as AMR—WB is performed on the signal coding parameters, and the resulting monaural decoded signal (Μ ′) is output to the L channel synthesis filter 144 and the R channel synthesis filter 145.

[0028] The L channel synthesis filter 144 performs decoding on the monaural decoded signal (') input from the monaural signal decoding unit 143 using the L channel adaptive filter parameter input from the L channel adaptive filter 141. Processing is performed, and the obtained L channel reconstructed signal (L ′) is output to the reconstructed cross correlation calculating unit 105.

[0029] The R channel synthesis filter 145 filters the monaural decoded signal (Μ ') input from the monaural signal decoding unit 143 using the R channel adaptive filter parameter input from the R channel adaptive filter 142. Processing is performed, and the obtained R channel reconstructed signal (R ′) is output to the reconstructed cross correlation calculating unit 105.

FIG. 5 is a diagram for explaining the configuration and operation of the adaptive filter that constitutes the L-channel adaptive filter 141. In this figure, η indicates a sample number on the time axis. Η (ζ) is H (z) = b + b (z ¹ ) + b (z— ² ) +--+ b (z ^k ), an adaptive filter, for example

0 1 2 k

For example, FIR (Finite Impulse Response) filter model (transfer function) is shown. Where k indicates the order of the adaptive filter parameter and b = [b 1, b 2,.

0 1 k

Indicates. X (n) represents an input signal of the adaptive filter. In the case of the L channel adaptive filter 141, the monaural signal (M) input from the monaural signal generation unit 102 is used. Y (n) represents the reference signal of the adaptive filter. In the case of the L channel adaptive filter 141, the L channel signal (L) is used.

[0031] The adaptive filter obtains an adaptive filter parameter b = [b 1, b 2,..., B] such that the mean square error between the reference signal and the input signal is minimized according to the following equation (5). Output.

0 1 k

[Equation 5]

MSE (b) = £} e («)] ²

― Y '(n) f} = E. yt ri) one b _t x {n one; ι (5

[0032] In this expression,! /, E represents a statistical expectation operator, and e

(n) represents the prediction error, and k represents the filter order.

[0033] The adaptive filter constituting the R channel adaptive filter 142 is an L channel adaptive filter 14. 1 is different from the filter constituting the L channel adaptive filter 141 in that the R channel signal (R) is input as the reference signal y (n).

FIG. 6 is a flowchart showing an example of the procedure of stereo speech coding processing in stereo speech coding apparatus 100.

First, in step (hereinafter abbreviated as “ST”) 151, the original cross-correlation calculation unit

101 calculates the cross-correlation coefficient C between the original L channel signal (U and R channel signal (R)).

[0036] Next, in ST152, monaural signal generation section 102 generates a monaural signal using the L channel signal and the R channel signal.

Next, in ST153, monaural signal encoding section 103 encodes the monaural signal to generate a monaural signal encoding parameter.

[0038] Next, in ST154, L channel adaptive filter 141 obtains an L channel adaptive filter parameter that minimizes the mean square error between the L channel signal and the monaural signal.

[0039] Next, in ST155, the R channel adaptive filter 142 obtains an R channel adaptive filter parameter that minimizes the mean square error between the R channel signal and the monaural signal.

[0040] Next, in ST156, monaural signal decoding section 143 performs decoding processing using the monaural signal encoding parameter, and generates a monaural decoded signal (Μ ').

[0041] Next, in ST157, the L channel synthesis filter 144 reconstructs the L channel signal using the monaural decoded signal (と,) and the L channel adaptive filter parameter, and the L channel reconstructed signal (L ' ) Is generated.

Next, in ST158, R channel synthesis filter 145 performs monaural decoded signal (Μ,

) And the R channel adaptive filter parameters, the R channel signal is reconstructed to generate the R channel reconstructed signal (R ′).

[0043] Next, in ST159, reconstructed cross-correlation calculating section 105 performs L channel reconstructed signal

The cross-correlation coefficient C between (L ') and the R channel reconstructed signal (R') is calculated. Next, in ST160, cross-correlation comparison section 106 compares cross-correlation coefficient C with cross-correlation coefficient C, and obtains cross-correlation comparison result α.

2

[0045] Next, in ST161, multiplexing section 107 multiplexes and transmits the monaural signal encoding parameter, L channel adaptive filter parameter, R channel adaptive filter parameter, and cross-correlation comparison result α.

[0046] As described above, stereo speech coding apparatus 100 converts the adaptive filter parameters obtained in L-channel adaptive filter 141 and R-channel adaptive filter 142 into the space related to the inter-channel level difference (ILD) and the inter-channel time difference (ITD). The information parameter is transmitted to the stereo speech decoding apparatus 200. Stereo speech coding apparatus 100 also performs stereo speech decoding using cross-correlation comparison result α obtained in cross-correlation comparing section 106 as a spatial information parameter regarding inter-channel cross-correlation (ICC) between the L channel signal and the R channel signal. Sent to device 200.

[0047] In this embodiment, stereo speech coding apparatus 100 uses correlation coefficient C between the original L channel signal (L) and R channel signal (R) instead of cross correlation comparison result α. May be transmitted. Even in this case, the decoder can obtain the cross-correlation coefficient C between the L-channel reconstructed signal (L ') and the R-channel reconstructed signal (R').

2

, A is obtained by calculating at the decoder side. As a result, the stereo speech coding apparatus 100 does not need to generate L channel and R channel reconstructed signals, thereby reducing the amount of computation.

FIG. 7 is a block diagram showing the main configuration of stereo speech decoding apparatus 200.

[0049] Separating section 201 performs separation processing on the bit stream transmitted from stereo speech coding apparatus 100, and obtains the obtained monaural signal coding parameter, L channel adaptive filter parameter, and R channel adaptive filter parameter. The result is output to stereo speech decoding section 202, and cross-correlation comparison result α is output to L channel spatial information reproduction section 205 and R channel spatial information reproduction section 206.

[0050] Stereo speech decoding section 202 decodes the L channel signal and the R channel signal using the monaural signal encoding parameter, the L channel adaptive filter parameter, and the R channel adaptive filter parameter input from demultiplexing section 201. L channel reconstruction obtained The signal (L ′) is output to the L-channel all-pass filter 203 and the L-channel spatial information reproduction unit 205. Stereo audio decoding section 202 outputs the R channel reconstructed signal (R ′) obtained by decoding to R channel all-pass filter 204 and R channel spatial information reproduction section 206. Details of the stereo audio decoding unit 202 will be described later.

[0051] The L-channel all-pass filter 203 uses the all-pass filter parameter representing the transfer function shown in the following equation (6) and the L-channel reconstructed signal (L ') input from the stereo speech decoding unit 202. Generates L channel reverberation signal (L ') and reproduces L channel spatial information

Rev

Output to part 205.

[Equation 6]…,.

[0052] In this equation, 示し represents the transfer function of the all-pass filter, and a = [a, a, ...

allpass 1 2

, A] indicate all-pass filter parameters, N indicates all-pass filter parameters

N

Indicates a number. Note that the input signal L ′ and output signal L ′ of the L-channel all-pass filter 203

Since they are orthogonal to Rev, their cross-correlation values Correlation [L ′ (n), L ′ (n)] = 0. Ma

Rev

Since the energy of L, and the energy of L, are the same, IL, (n) I ² = IL '(n)

Rev Rev

I'll do it.

The R channel all-pass filter 204 uses the all-pass filter parameter representing the transfer function shown in the above equation (6) and the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202. R channel reverberation signal (R ') is generated and R channel spatial information is regenerated.

Rev

Output to current part 206.

[0054] The L channel spatial information reproduction unit 205 receives the cross-correlation comparison result input from the separation unit 201.

a, using the L channel reconstructed signal (L ′) input from the stereo speech decoding unit 202 and the L channel reverberation signal (L ′) input from the L channel all-pass filter 203,

Rev

Calculate and output the L channel decoded signal (L '') according to the following equation (7).

[0055] [Equation 7] r = «z, '+ Vfi-« ² The R channel spatial information reproduction unit 206 is input from the cross correlation comparison result a input from the separation unit 201, the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202, and the R channel all-pass filter 204. R channel reverberation signal (R ') is used.

Rev

The R channel decoded signal (R ″) is calculated and output according to the following equation (8).

[Equation 8] '= cR' + ^-a ² ) R _R (8) As mentioned above, L 'and' are orthogonal and the energy is the same,

Rev

The energy of the signal (L ″) is given by the following equation (9). Similarly, the energy of the decoded signal (R ′ ′) is given by the following equation (10).

[Equation 9] r, aL I one c L _D + 2 «V1- (9)

R R R, Rev do) The cross-correlation value C between the L channel decoded signal (L ',) and the R channel decoded signal (R',)

3 The molecular term is given by equation (11) below. Here, if different filters are used for the L-channel all-pass filter 203 and the R-channel all-pass filter 204, the signals for the correlation calculation of the second to fourth terms on the right side of Equation (11) are almost orthogonal. The second to fourth terms are much smaller than the first term and can be regarded as almost zero. Therefore, the cross-correlation value C between the L channel decoded signal (L '') and the R channel decoded signal (R ") is obtained from the equations (4), (9), (10),

Three

As shown in the following equation (12), it is equal to the cross-correlation coefficient C between the original L channel signal (L) and the R channel signal (R). From the above, the L channel spatial information reproduction unit 205 and the R channel spatial information reproduction unit 206 calculate the decoded signal using the cross-correlation comparison result a according to Equation (7) and Equation (8), thereby Cross-correlation value

A two-channel decoded signal that is equal to the cross-correlation value can be obtained.

[Equation 11] '· R "= a L · R) + a (l -? 2) (L · 7 Rev) + a one ^{α 2) (· + (1} - a 2) (4 ev' R Rev) ... (1 1 )

^{¾i ( «) 2 2 ^ (} »)

FIG. 8 is a block diagram showing the main configuration inside stereo audio decoding section 202.

The monaural signal decoding unit 221 performs decoding processing using the monaural signal encoding parameter input from the separation unit 201, and converts the obtained monaural decoded signal (Μ ′) into the L channel synthesis filter 222 and R Output to channel synthesis filter 223.

[0061] The L channel synthesis filter 222 performs a decoding process for filtering the monaural decoded signal (Μ ') input from the monaural signal decoding unit 221 with the L channel adaptive filter parameter input from the separation unit 201. The obtained L channel reconstructed signal (L ′) is output to the L channel all-pass filter 203 and the L channel spatial information reproduction unit 205.

The R channel synthesis filter 223 performs a decoding process for filtering the monaural decoded signal (復 ′) input from the monaural signal decoding unit 221 with the R channel adaptive filter parameter input from the separation unit 201. The obtained R channel reconstructed signal (R ′) is output to the R channel all-pass filter 204 and the R channel spatial information reproduction unit 206.

FIG. 9 is a flowchart showing an example of a procedure of stereo speech decoding processing in stereo speech decoding apparatus 200.

[0064] First, in ST251, separation section 201 performs separation processing using the bitstream transmitted from stereo speech coding apparatus 100, and performs monaural signal coding parameters, L channel adaptive filter parameters, R channel adaptive filters. Parameters and cross-correlation comparison result a are generated.

Next, in ST252, monaural signal decoding section 221 decodes the monaural signal using the monaural signal encoding parameter to generate a monaural decoded signal (Μ ′).

[0066] Next, in ST253, L channel synthesis filter 222 performs monaural decoded signal (Μ,) For the L channel adaptive filter parameters

, L channel reconstructed signal (L ') is generated.

Next, in ST254, R channel synthesis filter 223 performs monaural decoded signal (M,

) Is subjected to a decoding process for filtering with the R channel adaptive filter parameter to generate an R channel reconstructed signal (R ′).

[0068] Next, in ST255, the L-channel all-pal filter 203 generates an L-channel reverberation signal (L ') using the L-channel reconstructed signal (L').

Rev

[0069] Next, in ST256, the R channel all-pal filter 204 generates an R channel reverberation signal (R ') using the R channel reconstructed signal (R').

Rev

Next, in ST257, L channel spatial information reproduction section 205 uses L channel reconstruction signal (L ′), L channel reverberation signal (L ′), and cross correlation comparison result α to

Rev

A channel decoded signal (L '') is generated.

[0071] Next, in ST258, R channel spatial information reproduction section 206 uses R channel reconstruction signal (R '), R channel reverberation signal (R'), and cross correlation comparison result α to

Rev

A channel decoded signal (R '') is generated.

Thus, according to the present embodiment, in stereo speech coding apparatus 100, an L channel adaptive filter parameter, which is a spatial information parameter regarding inter-channel level difference (ILD) and inter-channel time difference (ITD), and In addition to the R channel adaptive filter parameters, a cross correlation comparison result a, which is spatial information related to inter-channel cross correlation (ICC), is transmitted to stereo speech decoding apparatus 200. Since the stereo speech decoding apparatus performs stereo speech decoding using these pieces of information, the power S can be improved by improving the spatial image of the decoded speech.

In the present embodiment, the L channel adaptive filter parameter and the L channel adaptive filter parameter are obtained and transmitted as spatial information parameters regarding the inter-channel level difference (ILD) and the inter-channel time difference (ITD). The power described by taking the case as an example The present invention is not limited to this, and a spatial information parameter indicating inter-channel difference information other than the L channel adaptive filter parameter and the R channel adaptive filter parameter may be obtained and transmitted. . Further, in the present embodiment, the case where the cross-correlation comparison unit 106 obtains the cross-correlation comparison result according to the above equation (4) has been described as an example, but the present invention is not limited to this, Find other comparison results that uniquely represent the difference between the relationship number C and the cross-correlation C

1 2

May be.

[0075] In this embodiment, the L channel reverberation signal (L ') and the R channel reverberation signal are used in the L channel allpass filter 203 and the R channel onrepath filter 204 using a fixed allpass filter parameter. An example of generating (R ')

Rev Rev

As described above, all-pass filter parameters transmitted from stereo speech coding apparatus 100 may be used.

[0076] Further, in the present embodiment, in FIG. 6 and FIG. 9, an example is shown in which processing of each step is performed serially as an example of a procedure. However, there are steps that can be reordered or parallelized. . For example, the L channel adaptive filter parameter is calculated in ST154 and the R channel adaptive filter parameter is calculated in ST155 as an example. The order of these two steps is changed, and the R channel adaptive filter parameter is changed in ST154. The L channel adaptive filter parameters may be calculated in ST155, or the processing in ST154 and ST155 may be performed in parallel. Further, the decoding of the monaural signal performed in ST156 may be performed before ST154 or before ST155, and may be processed in parallel with ST154 or ST155. In the same way, J jets of ST157 and ST158, one jet of ST253 and ST25 4, one jet of S Ding 255 and S Ding 256, and one jet of S Ding 257 and S Ding 258 were replaced. Or parallel processing. Further, ST151 may be fi at any timing from the start to ST159.

In this embodiment, in FIGS. 7 and 8, the monaural decoded signal (Μ ′) generated by monaural signal decoding section 221 is not output to the outside of stereo audio decoding apparatus 200. The present invention is not limited to this. For example, when the generation of the L channel decoded signal (L ′ ′) or the R channel decoded signal (R ′ ′) fails, the monaural decoded signal ( Μ ') can be output to the outside of the stereo audio decoding device 200 and used as the decoded audio of the stereo audio decoding device 200! /.

Further, in the present embodiment, stereo speech reconstruction unit of stereo speech coding apparatus 100 104 is an L channel adaptive filter obtained by encoding the monaural signal (M) input from the monaural signal generation unit 102 with respect to the L channel signal (L) and the R channel signal (R). Parameter and R channel adaptive filter parameter, and the monaural decoded signal (Μ ′) obtained by performing decoding using the monaural signal encoding parameter input from the monaural signal encoding unit 103. The power described by taking the case of obtaining the L channel reconstructed signal (V) and the R channel reconstructed signal (R ′) as an example. The present invention is not limited to this, and the stereo sound reconstructing unit 104 is connected to the monaural signal (M). Without using monaural signal encoding parameters, the L channel signal (L) and the R channel signal (R) are encoded and decoded, respectively. ') And R channel reconstruction signal (R') may be obtained. In such a case, the stereo audio encoding device may not include the monaural signal generation unit 102 and the monaural signal encoding unit 103. In such a case, instead of the L channel adaptive filter parameter and the R channel adaptive filter parameter, the L channel coding parameter and the R channel coding parameter are replaced by the L channel signal (L) and the R channel signal in the stereo speech reconstruction unit. It is generated by the encoding process (R). For this reason, the bit stream output from this stereo speech coding apparatus may not include a monaural signal coding parameter.

[0079] Then, as a stereo speech decoding apparatus corresponding to such a stereo speech coding apparatus, the stereo speech decoding apparatus 200 shown in Fig. 7 does not use monaural signal coding parameters. That is, when the monaural signal encoding parameter is not included in the bit stream, the monaural signal encoding parameter is not output from the separation unit 201. Further, the stereo speech decoding unit 202 does not include the monaural signal decoding unit 221 and performs the processing within the stereo speech reconstruction unit of the corresponding stereo speech coding apparatus for the L channel coding parameter and the R channel coding parameter. The L channel reconstructed signal (L ′) and the R channel reconstructed signal (R ′) may be obtained by performing a decoding process similar to the above decoding process.

[0080] (Embodiment 2)

In Embodiment 1, the decoding side generates L channel and R channel decoded signals. In this configuration, the L channel reverberation signal (L ') and the R channel reverberation signal (R') are used.

Rev Rev

However, the present invention is not limited to this, and the L channel reverberation signal (L ′) and the

Instead of Rev and R channel reverberation signal (R '), a configuration using monaural reverberation signal can be used.

Rev

good. In the second embodiment, a specific configuration and operation in that case will be described.

[0081] The configuration and operation of the stereo speech coding apparatus according to the present embodiment are the same as those of Embodiment 1 except for the operation of cross-correlation comparing section 106 in FIG. Cross-correlation comparison section 106 according to the present embodiment obtains cross-correlation comparison result α using equation (13) instead of equation (4).

However, (,: Cross-correlation coefficient between L channel signal and R channel signal

C _2: L-channel reconstructed signal and the R channel reconstructed signal

Cross-correlation coefficient

a: Cross-correlation comparison result

FIG. 10 is a block diagram showing the main configuration of stereo speech decoding apparatus 300 according to the present embodiment. Here, the configuration and operation of separation section 201 and stereo speech decoding section 202 are the same as the configuration and operation of separation section 201 and stereo speech decoding section 202 of stereo speech decoding apparatus 200 shown in FIG. Therefore, the explanation is omitted o

The monaural signal generation unit 301 uses the L channel reconstructed signal (L ′) and the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202 to generate a monaural reconstructed signal (M ′). Is calculated and output. The monaural reconstructed signal (Μ ′) is calculated in the same manner as the monaural signal (Μ) in the monaural signal generation unit 102 in FIG.

[0084] The monaural signal all-pass filter 302 generates a monaural reverberation signal (Μ ') using the all-pass filter parameter and the monaural reconstructed signal (Μ') input from the monaural signal generation unit 301, and outputs an L channel. Spatial information reproduction unit 303 and R channel spatial information

Rev

Output to current part 304. Here, the all-pass filter parameters are the L-channel all-pass filter 203 and the R-channel all-pass filter shown in FIG. Similar to the data 204, it is represented by the transfer function shown in equation (6).

The L channel spatial information reproduction unit 303 receives the cross correlation comparison result a input from the separation unit 201, the L channel reconstructed signal (L ′) input from the stereo speech decoding unit 202, and the monaural signal all-pass filter 302. Using the monaural reverberation signal (M,)

Rev

The L channel decoded signal (L ′ ′) is calculated and output according to the following equation (14).

[Equation 14]

Similarly, the R channel spatial information reproduction unit 304 receives the cross-correlation comparison result α input from the separation unit 201, the R channel reconstructed signal (R ′) input from the stereo speech decoding unit 202, and the monaural signal all-pass filter 302. Monaural reverberation signal (Μ,

Re

) To calculate and output the R channel decoded signal (R ′ ′) according to the following equation (15).

[Equation 15]

R "= aR '-, (1 5)

Id Here, since L ′ and M, can be regarded as almost orthogonal, the L channel decoded signal (L ′ ′

Rev

) Is given by the following equation (16). Similarly, R 'and M' are almost orthogonal

Rev

Therefore, the energy of the R channel decoded signal (R ′ ′) is expressed by the following equation (17).

[Equation 16]

R R (1 7) Also, the L channel decoded signal is obtained from the orthogonality between L 'and M' and the orthogonality between R 'and M'.

Rev Rev

(L '') and R channel decoded signal (R '') cross-correlation value The numerator of C is given by equation (18).

Three

It is. Therefore, the cross-correlation between the L channel decoded signal (L '') and the R channel decoded signal (R '') The value C is calculated from the formulas (13), (16), (17), (18) as shown in the formula (19).

Three

Is equal to the cross-correlation coefficient C between the channel signal (L) and the R channel signal (R). From the above, the L channel spatial information reproduction unit 303 and the R channel spatial information reproduction unit 304 calculate the decoded signal using the cross-correlation comparison result α according to the equations (14) and (15), The cross-correlation value between two channels is the same as the original cross-correlation value 1

Signal can be obtained.

[Equation 18]

As described above, according to the present embodiment, in the generation of the L channel and R channel decoded signals on the decoding side, the L channel reverberation signal (L ′) and the R channel reverberation signal (R ′)

Rev

) Instead of using a monaural reverberation signal (Μ ')

Rev Rev

The spatial information contained in the signal can be reproduced, and the spatial image of the decoded stereo audio signal can be improved.

[0090] Also, according to the present embodiment, instead of generating two types of reverberation signals of the L channel and the R channel on the decoding side, it is only necessary to generate a reverberation signal for a monaural signal. The amount of calculation for generating can be reduced.

[0091] In the present embodiment, the force S described by taking as an example the case where the monaural reconstructed signal (Μ ') is calculated by the monaural signal generation unit 301, the present invention is not limited to this, and stereo audio decoding is performed. As shown in FIG. 8, when unit 202 has a monaural signal decoding unit that decodes a monaural signal, monaural reconstructed signal (Μ ′) may be obtained directly by stereo audio decoding unit 202.

[0092] The embodiments of the present invention have been described above.

[0093] In the above embodiments, the left channel is the L channel and the right channel is the R channel. It goes without saying that the positional relationship between the left and right is not limited by this notation.

Furthermore, although the stereo speech decoding apparatus in each of the above embodiments has been described as receiving and processing the bitstream transmitted by the stereo speech coding apparatus in each of the above embodiments, the present invention is not limited to this. The bit stream received and processed by the stereo audio decoding device in each of the above embodiments is not limited to this, and may be any bit stream transmitted by an encoding device capable of generating a bit stream that can be processed by this decoding device. .

[0095] Further, the stereo speech coding apparatus and stereo speech decoding apparatus according to the present invention can be mounted on a communication terminal apparatus in a mobile communication system, and thereby a communication terminal having the same effects as described above. An apparatus can be provided.

[0096] Further, here, the power described by taking the case where the present invention is configured by hardware as an example can be realized by software. For example, the stereo sound encoding method / decoding method algorithm according to the present invention is described in a programming language, and the program is stored in a memory and executed by an information processing means, whereby the stereo sound according to the present invention is recorded. A function similar to that of the encoding device / decoding device can be realized.

Further, each functional block used in the description of each of the above embodiments is typically realized as an LSI that is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include some or all of them.

[0098] Although LSI is used here, depending on the degree of integration, IC, system LSI, super L

Sometimes called SI, Unoraler LSI, etc.

Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. You can use FPGA (Field Programmable Gate Array) that can be programmed after LSI manufacturing, or a reconfigurable processor that can reconfigure the connection or setting of circuit cells inside the LSI! / .

[0100] Further, if integrated circuit technology that replaces LSI emerges as a result of progress in semiconductor technology or other derived technology, it is naturally also possible to integrate functional blocks using this technology. There is a possibility of applying nanotechnology. [0101] August 2006 4) Japanese Patent Application 2006—213634 Application and June 2007 14 Japanese Patent Application 2007—157759 Japanese Application, Disclosure Contents of Drawings, Drawings and Abstracts Are all incorporated herein by reference.

Industrial applicability

[0102] The stereo speech coding apparatus, stereo speech decoding apparatus, and these methods according to the present invention can be applied to uses such as stereo speech coding of mobile communication terminals.

Claims

The scope of the claims

[1] First calculation means for calculating a first cross-phase relation number between the first channel signal and the second channel signal constituting stereo sound,

Stereo audio reconstructing means for generating a first channel reconstructed signal and a second channel reconstructed signal using the first channel signal and the second channel signal, and the first channel reconstructed signal and the second channel reconstructed signal. A second calculating means for calculating a second correlation number with the construction signal;

A comparison means for obtaining a cross-correlation comparison result including spatial information of the stereo speech by comparing the first cross-correlation coefficient and the second cross-correlation coefficient;

A stereo speech coding apparatus comprising:

[2] The first calculating means is represented by the formula (1)

Country

Where n is the sample number on the time axis

L (n): First channel signal

R (n) _: Second channel signal

C _{：: Calculate} the first cross-correlation coefficient according to the cross-correlation coefficient between the first channel signal and the second channel signal,

The second calculation means is represented by the equation (2)

[Equation 2]

2 ('(")

C ₂ =, "'(C ^… (2) where n is the sample number on the time axis

L '(n): 1st channel reconstruction signal

R '(n): Second channel reconstructed signal

C2 _: The first channel reconstruction signal and the second channel reconstruction signal

Cross-correlation coefficient To calculate the second cross-correlation coefficient according to

The comparison means has the formula (3)

[Equation 3]

Where: Cross-correlation coefficient between the first channel signal and the second channel signal

Cross-correlation coefficient

a: obtaining the cross-correlation comparison result according to the cross-correlation comparison result,

The stereo speech coding apparatus according to claim 1.

[3] a monaural signal generating means for generating a monaural signal using the first channel signal and the second channel signal;

Monaural signal encoding means for generating a monaural signal encoding parameter by encoding the monaural signal;

Further comprising

The stereo audio reconstruction means includes:

Generating a first channel reconstructed signal and a second channel reconstructed signal by using the monaural signal and the monaural signal encoding parameter for each of the first channel signal and the second channel signal;

The stereo speech coding apparatus according to claim 1.

[4] The stereo sound reconstruction means includes:

A first adaptive filter for obtaining a first adaptive filter parameter that minimizes a mean square error between the monaural signal and the first channel signal;

A second adaptive filter for obtaining a second adaptive filter parameter that minimizes a mean square error between the monaural signal and the second channel signal;

Monaural signal decoding means for generating a monaural decoded signal by decoding the monaural signal using the monaural signal encoding parameter; A first synthesis filter for generating the first channel reconstructed signal by filtering the monaural decoded signal with the first adaptive filter parameter;

A second synthesis filter for generating the second channel reconstructed signal by filtering the monaural decoded signal with the second adaptive filter parameter;

Comprising

The stereo speech coding apparatus according to claim 3.

[5] The first parameter and the second parameter relating to the first channel signal and the second channel signal, respectively, constituting the stereo sound, generated from the received bit stream in the encoding device, and the first channel signal A first mutual correlation with the second channel signal, and a second mutual correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. Separation means for obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the correlation; and

Stereo audio decoding means for generating a first channel reconstructed decoded signal and a second channel reconstructed decoded signal using the first parameter and the second parameter, and a first using the first channel reconstructed decoded signal Stereo reverberation signal generating means for generating a channel reverberation signal and generating a second channel reverberation signal using the second channel reconstructed decoded signal;

First spatial information reproduction means for generating a first channel decoded signal using the first channel reconstructed decoded signal, the first channel reverberation signal, and the cross-correlation comparison result; and the second channel reconstruction A stereo speech decoding apparatus comprising: a second spatial information reproduction unit that generates a second channel decoded signal using a decoded signal, the second channel reverberation signal, and the cross-correlation comparison result.

[6] The stereo reverberation signal generating means includes:

A first all-pass filter that generates the first channel reverberation signal by performing all-pass filtering on the first channel reconstructed decoded signal;

A second all-pass filter that generates the second channel reverberation signal by performing all-pass filtering on the second channel reconstructed decoded signal; 6. The stereo speech decoding apparatus according to claim 5, further comprising:

[7] The first parameter and the second parameter relating to the first channel signal and the second channel signal, respectively, constituting the stereo sound, generated from the received bit stream in the encoding device, and the first channel signal A first cross-correlation with the second channel signal and a second cross-correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. And a cross-correlation comparison result including spatial information regarding the stereo sound obtained by comparing the first channel reconstructed decoded signal and the second channel using the first parameter and the second parameter. Stereo audio decoding means for generating a 2-channel reconstructed decoded signal, and using the first channel reconstructed decoded signal and the second channel reconstructed decoded signal Monaural reverberation signal generating means for generating a monaural reverberation signal;

First spatial information reproduction means for generating a first channel decoded signal using the first channel reconstructed decoded signal, the monaural reverberation signal, and the cross-correlation comparison result; and the second channel reconstructed decoding A stereo audio decoding device comprising: second spatial information reproduction means for generating a second channel decoded signal using a signal, the monaural reverberation signal, and the cross-correlation comparison result.

[8] The monaural reverberation signal generating means includes:

Monaural signal generating means for generating a monaural reconstructed signal using the first channel reconstructed decoded signal and the second channel reconstructed decoded signal;

A monaural signal all-pass filter that generates the monaural reverberation signal by all-pass filtering the monaural reconstructed signal;

The stereo speech decoding apparatus according to claim 7, further comprising:

[9] calculating a first correlation number between the first channel signal and the second channel signal constituting the stereo sound;

Generating a first channel reconstructed signal and a second channel reconstructed signal using the first channel signal and the second channel signal;

Second mutual phase relationship between the first channel reconstructed signal and the second channel reconstructed signal Calculating a number;

Obtaining a cross-correlation comparison result including spatial information of the stereo speech by comparing the first cross-correlation coefficient and the second cross-correlation coefficient;

Stereo audio encoding method comprising:

[10] A first parameter and a second parameter relating to the first channel signal and the second channel signal, respectively, constituting stereo sound, generated from the received bit stream in the encoding device, and the first channel signal A first cross-correlation with the second channel signal and a second cross-correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. Obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the first channel reconstructed decoded signal and the second channel using the first parameter and the second parameter. Generating a channel reconstructed decoded signal;

The first channel reconstructed decoded signal is used to generate a first channel reverberant signal using the first channel reconstructed decoded signal and to generate a second channel reverberant signal using the second channel reconstructed decoded signal. Generating a first channel decoded signal using the first channel reverberation signal and the cross-correlation comparison result;

Generating a second channel decoded signal using the second channel reconstructed decoded signal, the second channel reverberation signal, and the cross-correlation comparison result;

Stereo audio decoding method comprising:

[11] The first and second parameters relating to the first channel signal and the second channel signal, which are generated in the encoding device from the received bit stream and constitute stereo audio, respectively, and the first channel signal A first cross-correlation with the second channel signal and a second cross-correlation between the first channel reconstructed signal and the second channel reconstructed signal generated using the first channel signal and the second channel signal. Obtaining a cross-correlation comparison result including spatial information about the stereo sound obtained by comparing the first channel reconstructed decoded signal and the second channel using the first parameter and the second parameter. Generating a channel reconstructed decoded signal; Generating a monaural reverberation signal using the first channel reconstructed decoded signal and the second channel reconstructed decoded signal;

Generating a first channel decoded signal using the first channel reconstructed decoded signal, the monaural reverberation signal, and the cross-correlation comparison result;

Generating a second channel decoded signal using the second channel reconstructed decoded signal, the monaural reverberation signal, and the cross-correlation comparison result;

Stereo audio decoding method comprising: