Generation of a multi-channel audio signal
The present invention relates to the generation of a multi-channel audio signal by spatial audio decoding, in particular but not exclusively from a matrix-encoded surround sound stereo signal.
Digital encoding of various source signals has become increasingly important during the last decade because digital signal representation and communication increasingly replaces analog representation and communication. For example, mobile telephone systems such as the global system for mobile communications are based on digital speech coding. Also distribution of media content such as video and music is increasingly based on digital content encoding.
Furthermore, in the last decade there has been a trend towards multi-channel audio, in particular towards spatial audio that extends beyond traditional stereo signals. For example, traditional stereo recording includes only two channels, whereas modern advanced audio systems typically use five or six channels, as in popular 5.1 surround sound systems. This provides a more complex listening experience in which the user may be surrounded by a sound source.
Various techniques and standards have been developed for the communication of such multi-channel signals. For example, six discrete channels representing a 5.1 surround system may be transmitted according to a standard such as the Advanced Audio Coding (AAC) or dolby digital standard.
However, in order to provide backward compatibility, it is known to downmix (down-mix) a higher number of channels to a lower number, and it is particularly common to use it to downmix a 5.1 surround sound signal to a stereo signal, which allows the stereo signal to be reproduced by a conventional (stereo) decoder and the 5.1 signal to be reproduced by a surround sound decoder.
Such existing methods for backward compatible multi-channel transmission that do not require extra multi-channel information can typically be characterized as matrixed-surround methods. Examples of matrix surround sound encoding include methods such as dolby pre-Logic ii (dolby Prologic ii) and Logic-7 (Logic-7). The common principle of these methods is that they multiply a multi-channel matrix of the input signal by a suitable non-quadratic matrix, thereby generating an output signal with a lower number of channels. In particular, matrix encoders typically apply a phase shift to the surround channels before mixing them with the front and center channels. The generation of the down-mix signal (Lt, Rt) may be given by, for example:
thus, the left downmix signal (Lt) consists of a left front signal (Lf), a center signal (c) multiplied by a coefficient q, a left surround signal (Ls) phase-rotated by 90 degrees (, j') and scaled by a coefficient a, and finally a right surround signal (Rs) also phase-rotated by 90 degrees and scaled by a coefficient b. The right down-mix signal (Rt) is similarly generated. The down-mix coefficients are typically 0.707 for q and a and 0.408 for b.
The basic reason for the opposite sign of the right down-mix signal (Rt) is that the surround channels are mixed in opposite phase in the down-mix pair (Lt, Rt). This property helps the decoder to distinguish the front and back channels from the down-mix signal pair. By applying a de-matching operation, the decoder is able to (partially) reconstruct the multi-channel signal from the stereo down-mix. How accurately the reconstructed multi-channel signal resembles the original multi-channel signal will depend on the specific properties of the multi-channel audio content.
Although matrixed surround sound systems provide backward compatibility, they can only provide low audio quality compared to discrete surround systems/encoders such as AAC or dolby digital systems.
An encoding/decoding technique called Spatial Audio Coding (SAC) has been developed to provide improved quality of a down-mixed audio signal. In SAC, the decoder down-mixes the channels to a lower number and additionally generates parametric data that describes the characteristics of the multi-channel signal relative to the down-mixed signal. Additional parametric data is then included in the bitstream together with the down-mix signal, which is typically a mono or stereo audio signal. Thus, a conventional decoder is able to ignore the additional parametric data and regenerate a mono or stereo signal (or possibly a matrix decoded low quality surround sound signal). In addition, SAC decoders can extract and use parametric data to generate higher quality multi-channel signals.
However, this approach has the problem that many systems are not equipped for SAC encoded signals. For example, many systems only utilize matrix surround sound coding that does not generate SAC parametric data. Furthermore, many signal and decoder standards do not provide the flexibility to allow additional parametric data to be included, thus requiring a full conversion to a new standard before SAC is configured. This may require that all existing encoders and decoders in the system be replaced with SAC-enabled encoders and decoders. In particular, there are many conventional systems based on two-channel stereo (such as radio broadcasting, digital radio broadcasting, etc.) in which the effort to add the additional information necessary for SAC is impracticable, i.e. the cost of expanding the system to use SAC is too high. Furthermore, a large amount of matrix encoded audio material is already available and this would need to be re-encoded by the SAC encoder before the benefits of SAC decoding are realized.
Hence, an improved system for processing and/or communicating a multi-channel audio signal would be advantageous and in particular a functionality allowing increased flexibility, increased audio quality, increased applicability of the SAC principle and/or improved performance would be advantageous.
Accordingly, the invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
According to a first aspect of the present invention, there is provided a decoder for generating a multi-channel audio signal, the decoder comprising: means for receiving a first signal comprising a first set of audio channels; estimating means for generating estimated parametric data for a second set of audio channels in response to characteristics of the first set of audio channels; the estimated parametric data relating characteristics of the second set of audio channels to characteristics of the first set of audio channels; a spatial audio decoder for decoding the first signal in response to the estimated parametric data to generate a multi-channel audio signal comprising a second set of channels.
The invention may allow improved performance. In particular, the invention may allow the spatial audio decoding principle to be used for signals that do not contain Spatial Audio Coding (SAC) parameters. The applicability of the decoder can be greatly increased and it can be used, for example, with a matrix encoder and an encoded signal. Improved audio quality may be achieved by spatial audio decoding.
The second set of channels typically includes more channels than the first set of channels. The second set of audio channels may include one or more of the first set of audio channels. One or more of the second set of audio channels may be generated without using the estimated parametric data. The estimated parametric data may specifically be data corresponding to spatial audio parameters and in particular to spatial audio parameters as typically generated by a conventional SAC encoder.
The estimated parametric data may directly relate the characteristic properties of the first set of channels to the second set of channels and the characteristic properties and/or may for example comprise data values which relate properties of different channels of the second set of channels, thereby indicating how the first signal can be decoded to provide the second set of audio channels. The characteristic may be a series of measurements of a single parameter at different time intervals. Alternatively, the characteristic may belong to more than a single parameter.
According to an optional feature of the invention, the first signal does not comprise parametric audio data associated with the second set of channels.
The invention allows the spatial audio decoding principle to be applied to signals that do not comprise parametric audio data for at least some of the output channels. Thus, the invention may allow for an improvement of the quality of non-SAC encoded signals. The invention may allow improved backward compatibility and may in particular allow improved audio quality for surround sound signals decoded from matrix encoded surround sound signals.
According to an optional feature of the invention, the estimating means comprises means for determining first parametric data for the first set of audio channels and means for mapping the first parametric data to estimated parametric data for the second set of audio channels.
This may allow for efficient implementation and estimation of the parametric data, which may provide a particularly high decoded audio quality. The mapping may be, for example, by using a look-up table or by calculation of a mathematical function. Thus, there is a direct relationship between the estimated parameter value and the specific parameter value of the first parameter data.
According to an optional feature of the invention, the first parametric data comprises at least one inter-channel level difference for at least two audio channels of the first set of audio signals.
This may allow for efficient implementation and estimation of parametric data, which may provide a particularly high decoded audio quality. In particular, studies have shown that inter-channel level differences are particularly suitable for estimating associated SAC parameter data from matrix encoded surround sound signals. The inventors of the present invention have realized that there is a high degree of correlation between inter-channel level differences of e.g. stereo matrix coded surround sound signals and SAC data of the surround sound signals.
According to an optional feature of the invention, the first parametric data comprises at least one inter-channel correlation coefficient value for at least two audio channels of the first set of audio signals.
This may allow efficient implementation and estimation of parametric data, which may provide a particularly high decoded audio quality. In particular, studies have shown that inter-channel correlation coefficient values are particularly suitable for estimating associated SAC parametric data from matrix encoded surround sound signals. The inventors of the present invention have realized that there is a high correlation between inter-channel correlation coefficients for e.g. stereo matrix encoded surround sound signals and SAC data for surround sound signals.
According to an optional feature of the invention, the multi-channel audio signal is a surround sound signal and the estimated parametric data comprises at least one parameter selected from the group consisting of: an inter-channel level difference between a front left and a surround left channel of the second set of channels; an inter-channel level difference between a front right and a surround right channel of the second group of channels; an inter-channel correlation coefficient between a front left and a surround left channel of the second set of channels; an inter-channel correlation coefficient between a front right and a right surround channel of the second set of channels; a prediction coefficient for a center channel of the second set of audio channels; the inter-channel level difference between the center channel and another channel (or combination of channels) of the second set of channels.
This may allow particularly high performance. In particular, these parameters are particularly suitable for generating high quality decoded signals by spatial audio decoders and typically have a high correlation between the parameters of the input signals such as matrix coded surround sound systems.
The at least one parameter selected from the group may be generated by directly mapping the at least one parameter from inter-channel level differences and/or inter-channel correlation coefficient values of at least two audio channels of the first group of audio signals.
According to an optional feature of the invention, the apparatus further comprises means for generating a time frequency slice (tile); wherein the estimation means is configured to generate estimated parameter data for the time-frequency tiles.
This facilitates handling and/or improves quality. In particular, it may allow for an easy and/or improved mapping between the parameters extracted from the first signal and the estimated parameter data.
According to an optional feature of the invention, the estimating means comprises means for directly mapping a set of at least one signal characteristic for a first set of audio channels of a time-frequency slice to parameter data values for a second set of audio channels.
This may allow efficient implementation and estimation of parametric data, which may provide a particularly high decoded audio quality. The mapping may be by using a look-up table or by calculation of a mathematical function, for example. Thereby, a direct relation is applied between the set of signal characteristics and the values of the corresponding estimated parameter data. The signal characteristics may be inter-channel level differences and/or inter-channel correlation coefficients of two channels of the first set of audio channels, which may be directly mapped, for example, to prediction coefficients and/or inter-channel correlation coefficients and/or inter-channel level differences of the second set of audio channels.
According to an optional feature of the invention, the spatial audio decoder is configured to perform at least one matrix operation using parameters determined in response to the estimated parametric data.
This may allow high performance. In particular it may allow a suitable implementation with a high decoding quality.
According to an optional feature of the invention, the decoder further comprises means for extracting parametric data for the second signal, and the spatial audio decoder is operable to decode the second signal in response to the extracted parametric data.
The decoder may be configured to process both SAC-encoded signals and non-SAC-encoded signals using the same spatial audio encoder. For SAC encoded signals the extracted data may be used and for non-SAC encoded signals the estimated parametric data may be used. The invention may provide increased applicability and/or backwards compatibility. The device may be configured to decode the first signal in response to the extracted parameter data, thereby allowing the correlation between the first and second signals to be exploited.
According to an optional feature of the invention, the decoder further comprises means for selecting the decoding mode in response to a characteristic of the first signal.
The decoder may for example be configured to operate in a first mode in which SAC parameter data is estimated, and in a second mode in which SAC parameter data is extracted from the received signal, and the decoder may be configured to select between the first and second modes in response to whether the first signal includes SAC data. Thus, a highly flexible decoder with the ability to handle a variety of different types of signals can be implemented.
According to an optional feature of the invention, the first set of audio channels consists of two audio channels.
The invention may allow improved down-mixing of a multi-channel signal into a decoding of a stereo signal.
According to an optional feature of the invention, the first signal is a matrix encoded surround sound signal.
The invention may allow a particularly improved down-mixing of a multi-channel signal into a decoding of a matrix encoded surround sound signal. In particular, experiments have shown that very accurate SAC data can be estimated for matrix encoded surround sound signals of signal based stereo channels.
According to an optional feature of the invention, the decoder further comprises a matrix surrounding inverse matrix, and means for determining at least one coefficient of the matrix surrounding inverse matrix in response to the estimated parametric data.
This may allow for an improved decoded audio quality for matrix encoded surround signals.
According to another aspect of the present invention, there is provided a method of generating a multi-channel audio signal, the method including: receiving a first signal comprising a first set of audio channels; generating estimated parametric data for a second set of audio channels in response to characteristics of the first set of audio channels; the estimated parametric data relating characteristics of the second set of audio channels to characteristics of the first set of audio channels; a spatial audio decoder decodes the first signal in response to the estimated parametric data to generate a multi-channel audio signal comprising a second set of channels.
According to another aspect of the invention, a computer program product for performing the method is provided.
According to another aspect of the present invention, there is provided a receiver for generating a multi-channel audio signal, the receiver including: means for receiving a first signal comprising a first set of audio channels; estimating means for generating estimated parametric data for a second set of audio channels in response to characteristics of the first set of audio channels; the estimated parametric data relating characteristics of the second set of audio channels to characteristics of the first set of audio channels; a spatial audio decoder for decoding the first signal in response to the estimated parametric data to generate a multi-channel audio signal comprising a second set of channels.
According to another aspect of the present invention, there is provided a transmission system including: an encoder for generating a first signal comprising a first set of audio channels by encoding a multi-channel signal; a transmitter for transmitting a first signal; means for receiving a first signal; estimating means for generating estimated parametric data for a second set of audio channels in response to characteristics of the first set of audio channels; the estimated parametric data relating characteristics of the second set of audio channels to characteristics of the first set of audio channels; a spatial audio decoder for decoding the first signal in response to the estimated parametric data to generate a decoded multi-channel audio signal comprising a second set of channels.
According to another aspect of the present invention, there is provided a method of transmitting and receiving audio data, the method including: generating a first signal comprising a first set of audio channels by encoding a multi-channel signal; transmitting a first signal; receiving a first signal; generating estimated parametric data for a second set of audio channels in response to characteristics of the first set of audio channels; the estimated parametric data relating characteristics of the second set of audio channels to characteristics of the first set of audio channels; a spatial audio decoder decodes the first signal in response to the estimated parametric data to generate a decoded multi-channel audio signal comprising the second set of channels.
According to another aspect of the present invention, there is provided an audio playback device comprising a decoder as described above.
These and other aspects, features and advantages of the present invention will become apparent from and elucidated with reference to one or more of the embodiments described hereinafter.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which
FIG. 1 illustrates a transmission system for communication of audio signals according to some embodiments of the present invention;
fig. 2 illustrates a block diagram of a typical SAC encoder;
fig. 3 illustrates an example of a typical SAC decoder;
FIG. 4 illustrates a decoder according to some embodiments of the inventions;
FIG. 5 illustrates elements of a decoder according to some embodiments of the inventions;
fig. 6 illustrates a method of generating a multi-channel audio signal according to some embodiments of the invention.
The following description focuses on embodiments of the invention applicable to decoding of a matrixed surround sound signal down-mixed into a stereo signal. However, it will be appreciated that the invention is not limited to this application but may also be applied to many other signals.
Fig. 1 illustrates a transmission system 100 for audio signal communication according to some embodiments of the present invention. The transmission system 100 comprises a transmitter 101 coupled to a receiver 103 via a network 105, which network 105 may specifically be the internet.
In the specific example, the transmitter 101 is a signal recording device and the receiver is a signal playback device 103, but it will be appreciated that in other embodiments the transmitter and receiver may be recorded in other applications and for other purposes. For example, the transmitter 101 and/or the receiver 103 may be part of the transcoding functionality and may, for example, provide interfacing with other signal sources or destinations.
In the particular example in which the signal recording function is supported, the transmitter 101 includes a digitizer (digitizer)107 that receives an analog signal that is converted to a digital PCM signal by sampling and analog-to-digital conversion. The analog signal is specifically a 5.1 surround sound multi-channel signal.
The transmitter 101 is coupled to the encoder 109 of fig. 1, which encodes the PCM signal according to an encoding algorithm. In particular, the encoder is a matrix encoder that generates a down-mixed stereo signal using the matrix operation of equation 1. Thus, the encoded signal is a matrix encoded surround sound signal.
The encoder 100 is coupled to a network transmitter 111 that receives the encoded signal and interfaces to the internet 105. The network transmitter may transmit the encoded signal to the receiver 103 via the internet 105.
The receiver 103 includes a network receiver 113 that interfaces to the internet 105 and is configured to receive the encoded signal from the transmitter 101.
The network receiver 111 is coupled to a decoder 115. The decoder 115 receives the encoded signal and decodes it according to a decoding algorithm.
In the specific example in which a signal playing function is supported, the receiver 103 further comprises a signal player 117 which receives the decoded audio signal from the decoder 115 and presents this to the user. In particular, the signal player 113 may include a digital-to-analog converter, an amplifier, and a speaker according to the need to output the decoded audio signal.
In the described embodiment, the decoding algorithm used by the decoder 115 includes a SAC decoding element. For clarity, the operation of a typical SAC encoder will be described first.
Fig. 2 illustrates a block diagram of a typical SAC encoder 200. The encoder 200 divides the incoming signal into separate time-frequency tiles by a Quadrature Mirror Filter (QMF) bank 201. These time/frequency tiles are commonly referred to as "parameter bands".
For each parameter band, the SAC encoding element 203 determines a plurality of spatial parameters describing properties of the spatial image, such as inter-channel level differences and cross-correlation coefficients. In addition to the extraction of parameters, the SAC encoding element 203 generates a down-mixed mono or stereo from the multi-channel input signal. These signals are converted to the time domain by QMF synthesis bank 205. The resulting down-mix is fed to a bitstream processor 207 which generates a bitstream comprising the down-mix channels and the parametric data generated by the SAC encoding element 203. Preferably, the downmix is also encoded (using a conventional mono or stereo 'core' encoder) before transmission, while the bit stream and spatial parameters of the core encoder are preferably combined (multiplexed) into a single output bit stream.
Depending on the mode of operation, the data rate of this parametric data can cover a wide range of bit rates, from a few kBit/s for better quality multi-channel audio to a few tens of kBit/s for near transparent quality.
Furthermore, in the case of stereo down-mixing, the user has the option of either traditional stereo down-mixing or down-mixing compatible with matrixed surround systems. In the latter case, the encoder 200 can generate matrixed-surround compatible downmix using the matrix approach of equation 1. Alternatively, it may generate a matrixed-surround compatible downmix using a down-mix post processing unit (downmix post processing unit) operating in a regular (regular) stereo downmix manner. In this configuration, the encoder can include a matrixed-surround post-processor that modifies the regular stereo down-mix such that it is matrixed-surround sound compatible by using the spatial parameters extracted by the parameter estimation stage. The advantage of this approach is that the matrixed-surround process can be completely reversed by decoders with available spatial parameters.
The SAC decoder performs in principle the inverse of the encoder. Fig. 3 illustrates an example of a typical SAC decoder. The SAC decoder 300 comprises a splitter 301 which receives the bit stream and splits it into a down-mix signal and parametric data. The decoded down-mix is then processed by a QMF analysis bank 303, resulting in the same parameter bands as those applied in the SAC encoder 200. The spatial synthesis platform 305 reconstructs a multi-channel signal using the parameter data extracted by the separator 301. Finally, the QMF-domain signal is converted to the time domain by QMF synthesis block 307, resulting in a final multi-channel output signal.
Thus in a system where both the encoder and the decoder comprise SAC functionality, a high quality of the decoded multi-channel signal may be achieved for relatively low data rates. However, many already deployed systems and many audio materials do not make use of SAC functionality, the benefits are typically limited to new systems and re-encoded audio materials.
In the example of fig. 1, the decoder 115 comprises SAC decoding functionality, which can be used for non-SAC encoders and non-SAC encoded material. The decoder 115 may thus introduce some of the advantages of SAC without the need for re-encoding or SAC-compatible encoders and may provide a greatly improved quality-to-data rate ratio specifically for multi-channel signals.
Fig. 4 illustrates the decoder 115 of fig. 1 in more detail. The decoder 115 comprises a receiver 401 which receives a signal comprising a set of audio channels. In particular, the receiver receives a bitstream comprising two channels, which have been generated by matrix encoding the surround sound signal by the encoder 109. The receiver 401 receives a bit stream and generates two channels y of a down-mixed stereo signal1,y2. It should be noted that in the specific example, the encoder 109 is a conventional matrix encoder for surround signals, generating a bitstream comprising only two downmix channels. Thus, in the example, the bitstream does not comprise spatial audio parameter data. In other embodiments, the encoder 109 may be, for example, a SAC encoder, generating a matrix surround compatible stereo signal without SAC parametric data.
The decoder 115 further comprises a SAC decoding element 403 coupled to the receiver 401. SAC decoding element 403 decodes stereo down-mix channel y using SAC techniques as described previously1,y2. In particular, the operation of the SAC decoding element 403 corresponds to that described for the SAC decoder 300 of fig. 3. The SAC decoding element 403 thus generates an output surround sound signal corresponding to the surround signal matrix-encoded by the encoder 109.
As described previously, the stereo down-mix channels may have been encoded by a matrix encoder as described in equation 1 (eq.1). Alternatively, the down-mix channels may have been generated by the SAC encoder 203 comprising a post-processing unit to generate a matrix surround compatible down-mix. In both cases, the SAC decoding element 403 may comprise a pre-processing unit that reverses the operations that the encoder applies for matrix surround compatibility.
The decoder 115 further comprises an estimation processor 405 coupled to the receiver 401 and the SAC decoding element 403. The estimation processor 405 is configured to generate estimated parametric data that can be used to generate the output surround signal. In particular, the estimation processor 405 estimates the parametric data, which the SAC encoder would have generated for down-mixing the channels if SAC encoding had been performed. Thus, the estimated parametric data relates the characteristics of the output surround channels to the characteristics of the received down-mix channels, as it provides information on how they can be decoded to generate the output surround channels.
In the example of fig. 4, the estimation processor 405 generates the estimated parametric data such that it corresponds to SAC data that the SAC decoding element 403 can directly use to determine the output surround channels.
Thus, the decoder 115 uses the principles of SAC for decoding matrix encoded surround audio material. The estimation processor 405 uses signal cues (signal _ cue) of the received stereo input signal to determine the data used by the SAC decoding element 403. In particular, the estimation processor 405 estimates the inter-channel indication of the received stereo signal and maps this to SAC cues that can be used directly by the SAC decoding element 403. This may specifically allow the SAC decoding element 403 to act as a conventional SAC decoder, thereby facilitating backwards compatibility, reducing design and development requirements and allowing the same functionality to be used for decoding both SAC encoded signals and non-SAC encoded signals. Thus, in the example, the parameters obtained by analyzing the received two channels down-mix are used at the decoder side to generate the required SAC parameters.
The estimation processor 405 comprises an analysis processor 407 which determines one or more parameters for the stereo down-mix signal. In particular, the analysis processor 407 generates a stereo down-mix channel y1,y2Inter-channel level difference (ILD) value and inter-channel correlation coefficient (ICC) value.
The analysis processor 407 is coupled to a mapping processor 409 which maps the ILD and ICC values to SAC values related to the output channels.
The mapping processor 409 specifically exploits the surprising fact that, previously unknown, there is typically a close correlation between ILD and ICC values of the matrix-encoded surround signal and the spatial audio parameters of the original surround sound channels.
The mapping processor 409 can simplyUsing a look-up table to determine a channel y for down-mixing with stereo1,y2The SAC parameter values of the associated output surround channels. The determined ILD and ICC values or their representation, e.g. after quantization, can be used as addresses of a look-up table. Equally, the mapping processor 409 can calculate a predetermined function having ICC and ILD values as input parameters and providing the required SAC parameters as output parameters.
In this way, the mapping processor 409 can generate, for example, the following SAC parameters for the output surround sound channels:
inter-channel level difference between the left front and left surround channels.
Inter-channel level difference between the front right and the right surround channel.
Inter-channel correlation coefficients between the left front and left surround channels.
Inter-channel correlation coefficients between the front right and the right surround channels.
One or more prediction coefficients for one channel, such as the center channel.
The inter-channel level difference between the center channel of the output surround sound channel and the other channel (or combination of channels).
As a specific example, the analysis processor 407 can generate a stereo down-mix channel y1,y2ICC value and ILD value of. These two values are then used to generate a unique address for the lookup table. At a particular address, SAC parameter values typically present for these ICCs and ILDs have been stored. The mapping processor 409 thus simply retrieves the stored data values, thereby obtaining suitable estimated parameter data. This data is then fed to the SAC decoding element 403 where it is used in the same way as conventional SAC data generated by a SAC encoder.
It will be appreciated that the corresponding SAC parameter values for a given ILD and ICC value can be determined in any suitable manner. For example, simulations may be performed in which a large number of signals are encoded using both matrix encoding and SAC encoding. ICC and ILD values may then be derived for the matrix encoded signal and compared to parametric data generated by the SAC encoder. The data can be statistically processed to determine SAC parameters that are most likely to exist for a given ILD and ICC value, and can then be stored in the appropriate locations of the lookup table. It should be appreciated that the analysis is only required once and that the determined look-up table can be used by many decoders and for any received signal.
In practice, experiments and simulations have demonstrated that there is a close correlation between ILD and ICC values of matrix encoded down-mixed surround sound signals and SAC parameters of SAC encoded surround sound signals. Thus, SAC parameters can be estimated with relatively high accuracy and a greatly improved decoded audio quality can be achieved.
In the example of fig. 4, the estimation processor 405 operates on a time-frequency slice basis.
Specifically, stereo down-mix channel y1,y2First processed by a complex modulated QMF filter bank to generate individual time-frequency tiles. It should be appreciated that this processing may be shared between the estimation processor 405 and the SAC decoding element 403 and may be implemented, for example, in the SAC decoding element 403. The generation of time-frequency tiles containing frequency bands for time intervals is known to the person skilled in the art and will not be described in detail (one example can be found, for example, in breeebaart, j., van de Par, s., Kohlrausch, a., and Schuijers, E. (2005). "Parametric coding of stereo audio". Eurasip j. applied Signal pro., 9: 1305-.
The time-frequency tile is formulated by aggregating specific frequency bands and time slices. Typically, these time frequency tiles are relatively narrow at low frequencies and wide at high frequencies, according to psychoacoustic principles. The corresponding time resolution is typically between 11 and 50 ms.
For each generated time frequencyRate-slice, analysis processor 407 down-mixes channels y from stereo1,y2Two parameters ILD and ICC are generated. Specifically, if Y is1[k,b]For signal y1(complex valued) filter bank output, for filter output q and time samples k, Y2[k,b]For y represents2The corresponding QMF domain representation, the ILD parameters for parameter band b are given by:
<math><mrow>
<mi>ILD</mi>
<mrow>
<mo>[</mo>
<mi>b</mi>
<mo>]</mo>
</mrow>
<mo>=</mo>
<mn>10</mn>
<msub>
<mi>log</mi>
<mn>10</mn>
</msub>
<mfrac>
<mrow>
<munder>
<mi>Σ</mi>
<mi>k</mi>
</munder>
<munder>
<mi>Σ</mi>
<mi>q</mi>
</munder>
<msub>
<mi>Y</mi>
<mn>1</mn>
</msub>
<mrow>
<mo>[</mo>
<mi>k</mi>
<mo>,</mo>
<mi>q</mi>
<mo>]</mo>
</mrow>
<msubsup>
<mi>Y</mi>
<mn>1</mn>
<mo>*</mo>
</msubsup>
<mrow>
<mo>[</mo>
<mi>k</mi>
<mo>,</mo>
<mi>q</mi>
<mo>]</mo>
</mrow>
</mrow>
<mrow>
<munder>
<mi>Σ</mi>
<mi>k</mi>
</munder>
<munder>
<mi>Σ</mi>
<mi>q</mi>
</munder>
<msub>
<mi>Y</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>[</mo>
<mi>k</mi>
<mo>,</mo>
<mi>q</mi>
<mo>]</mo>
</mrow>
<msubsup>
<mi>Y</mi>
<mn>2</mn>
<mo>*</mo>
</msubsup>
<mrow>
<mo>[</mo>
<mi>k</mi>
<mo>,</mo>
<mi>q</mi>
<mo>]</mo>
</mrow>
</mrow>
</mfrac>
</mrow></math>
where the summation range of k is performed over the QMF domain time samples of the corresponding current time/frequency tile and the summation over q is performed over those filter bank outputs corresponding to parameter band b, (-) indicating the complex conjugate.
Similarly, use is made ofIndicating the real part, the ICC value for the parameter band b is given by:
for each pair of ICC and ILD values, mapping processor 409 may then perform a table lookup and determination:
ILD between respective time-frequency tiles of the left front and left surround channels;
ILD between respective time-frequency tiles of the front right and surround right channels;
ICC between corresponding time-frequency tiles of the left front and left surround channels;
ICC between respective time-frequency tiles of the front right and surround channels;
predicting coefficients to generate a center channel from a downmix, and/or
ILD between the center channel and any other channel (pair).
The decoder is thus fed with estimated parametric data, which corresponds to SAC parametric data that has been generated by the SAC encoder.
Fig. 5 illustrates elements of the SAC decoding element 403 in more detail.
The SAC decoding element 403 comprises a pre-mixed matrix unit 501, whose control signals enter a second mixing matrix unit 503 and a set of decorrelators (decorrelators)D1 to Dm) 505. The second mixing matrix generates an output signal based on the decorrelator output and the direct output of the pre-mixing matrix 501. The operation of SACs is well known to those skilled in the art and will not be described further herein for the sake of clarity and brevity. Further details may be found, for example, in Herre et al, "The reference model architecture for MPEG spatial audio coding", Proc.118thAES conference, Barcelona, Spain, 2005.
The estimated parameter data received from the estimation processor 405 is used to control the pre-mix matrix unit 501 and the second mix matrix unit 503 as if it were conventional SAC parameter data. In particular, the pre-mix matrix unit 501 may use the pre-mix matrix M1 to derive the input signal y from the input signal y1,y2Three intermediate signals l, r and c are generated, such as:
wherein
Wherein c is1And c2Two of the spatial parameters (prediction coefficients) generated by the mapping processor 409 are represented. Two decorrelators D1And D 2505 are fed into signals l and r, respectively. Finally, the output signal l for the left front, right front, center, left surround and right surround channelsf,rf,c,lsAnd rsBy means of a post-mix matrix M in the second mix matrix unit 5032And generating:
wherein,
hxy,zdepending on the ILD and ICC parameters generated by the mapping processor 409:
h11,X=p1,Xcos(vX+μX)
h12,X=p1,Xsin(vX+μX)
h21,X=p2,Xcos(vX-μX)
h22,X=p2,Xsin(vX-μX)
wherein,
<math><mrow>
<msub>
<mi>μ</mi>
<mi>X</mi>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mi>arccos</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>ICC</mi>
<mi>X</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow></math>
<math><mrow>
<msub>
<mi>v</mi>
<mi>X</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>μ</mi>
<mi>X</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mrow>
<mn>2</mn>
<mo>,</mo>
<mi>X</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>p</mi>
<mrow>
<mn>1</mn>
<mo>,</mo>
<mi>X</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<msqrt>
<mn>2</mn>
</msqrt>
</mfrac>
</mrow></math>
here, ILDXAnd ICCXRepresenting the ILD and ICC parameters for channel pair X (left front/left surround, or right front/right surround) generated by the mapping processor 409.
In case the SAC encoder operates in a matrix surround compatible mode by an encoder post-processor, a corresponding decoder-side pre-processor may be included in the pre-mixing matrix unit 501. In this particular case, an alternative pre-mix matrix may be used, which is derived from the original pre-mix matrix M1And the inverse matrix Q compatible with matrix surround:
<math><mrow>
<msubsup>
<mi>M</mi>
<mn>1</mn>
<mo>′</mo>
</msubsup>
<mo>=</mo>
<msub>
<mi>M</mi>
<mn>1</mn>
</msub>
<mi>Q</mi>
<mo>=</mo>
<mfenced open='[' close=']'>
<mtable>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>1</mn>
</msub>
<mo>+</mo>
<mn>2</mn>
</mtd>
<mtd>
<msub>
<mi>c</mi>
<mn>2</mn>
</msub>
<mo>-</mo>
<mn>1</mn>
</mtd>
</mtr>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>1</mn>
</msub>
<mo>-</mo>
<mn>1</mn>
</mtd>
<mtd>
<msub>
<mi>c</mi>
<mn>2</mn>
</msub>
<mo>+</mo>
<mn>1</mn>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>1</mn>
<mo>-</mo>
<msub>
<mi>c</mi>
<mn>1</mn>
</msub>
</mtd>
<mtd>
<mn>1</mn>
<mo>-</mo>
<msub>
<mi>c</mi>
<mn>2</mn>
</msub>
</mtd>
</mtr>
</mtable>
</mfenced>
<mi>Q</mi>
<mo>,</mo>
</mrow></math>
wherein the matrix-surrounding inverse matrix Q is given by:
wherein q isxy,zIs a function of the parameters generated by the mapping processor 409:
wherein g is1=g20.577, and wlAnd wrIs a function of the parameters given by the mapping processor 409:
alternatively, M1Or M1The entry of' may also be generated directly by the mapping processor 409, ignoring the formula given above.
It will be appreciated that whilst the above description has focussed on an embodiment in which the received signal does not include SAC parameter data, in other embodiments some parameter data may be included in the received signal. For example, the received signal may comprise parametric data relating to some output channels but not to other output channels, and the estimated parameters may be used for these other channels. As another example, the estimated parameter data may be used instead of parameter data that has been corrupted, e.g. due to transmission errors. Thus, the estimated parametric data may be used to enhance and supplement other parametric data received from the encoder.
Furthermore, it will be appreciated that one of the advantages of the described example is that the SAC decoding element 403 is able to use standard SAC decoding techniques. Therefore, the SAC decoding element 403 can equally be used to decode a conventional SAC signal received from a SAC encoder.
In particular, the transmission system 100 of fig. 1 may include a plurality of non-SAC encoders and a plurality of SAC encoders. Decoder 115 may alter its operation based on the signal being received. Thus, if a non-SAC signal is received, the operation may be as described above. However, if a SAC signal is received, the parametric data may simply be extracted and fed into the SAC decoding element 403 together with the down-mix channels. Thus, a higher flexibility of the decoder can be achieved.
Fig. 6 illustrates a method of generating a multi-channel audio signal according to some embodiments of the invention. The method is applicable to the decoder 115 of fig. 4 and will be described with reference thereto.
The method begins at step 601 where the receiver 401 receives a first utterance including a first set of audio channels.
Step 601 is followed by step 603 wherein the estimation processor 405 generates estimated parametric data for the second set of audio channels in response to characteristics of the first set of audio channels. The estimated parametric data relates characteristics of the second set of audio channels to characteristics of the first set of audio channels.
Step 603 is followed by step 605 wherein the SAC decoding element 403 decodes the first speech in response to the estimated parametric data to generate a multi-channel signal comprising a second set of channels.
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. It will be apparent, however, that any suitable distribution of functionality between different functional units and processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controller. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or mechanism.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly in the form of computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. Thus, the invention may be implemented in a single unit or in a physically and functionally distributed form between different units and processors.
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the invention is limited only by the appended claims. In addition, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the word "comprising" does not exclude the presence of other elements or steps.
Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. The inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate. Furthermore, the order of features in the claims does not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus, references to "a", "an", "first", "second", etc., do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.