EP2656342A1 - Verbesserte stereoparametrische kodierung/dekodierung für gegenphasige kanäle - Google Patents

Verbesserte stereoparametrische kodierung/dekodierung für gegenphasige kanäle

Info

Publication number
EP2656342A1
EP2656342A1 EP11785726.8A EP11785726A EP2656342A1 EP 2656342 A1 EP2656342 A1 EP 2656342A1 EP 11785726 A EP11785726 A EP 11785726A EP 2656342 A1 EP2656342 A1 EP 2656342A1
Authority
EP
European Patent Office
Prior art keywords
channel
stereo
signal
phase difference
phase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP11785726.8A
Other languages
English (en)
French (fr)
Inventor
Stéphane RAGOT
Thi Minh Nguyet Hoang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
Orange SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orange SA filed Critical Orange SA
Publication of EP2656342A1 publication Critical patent/EP2656342A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the present invention relates to the field of coding / decoding of digital signals.
  • the coding and decoding according to the invention is particularly suitable for the transmission and / or storage of digital signals such as audio-frequency signals (speech, music or other).
  • the present invention relates to the parametric encoding / decoding of multichannel audio signals, especially stereophonic signals hereinafter called stereo signals.
  • This type of coding / decoding is based on the extraction of spatial information parameters so that at decoding, these spatial characteristics can be reconstructed for the listener, in order to recreate the same spatial image as in the original signal.
  • Such a parametric coding / decoding technique is for example described in the document by J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, titled "Parametric Coding of Stereo Audio” in EURASIP Journal on Applied Signal Processing 2005 : 9, 1305-1322. This example is repeated with reference to FIGS. 1 and 2 respectively describing an encoder and a parametric stereo decoder.
  • FIG. 1 describes an encoder receiving two audio channels, a left channel
  • the temporal channels L (n) and R (n), where n is the entire subscript of the samples, are processed by the blocks 101, 102, 103 and 104 respectively which perform a short-term Fourier analysis.
  • the transformed signals L [j] and R [j], where j is the integer index of the frequency coefficients, are thus obtained.
  • Block 105 performs a channel reduction processing or "downmix" in English to obtain in the frequency domain from the left and right signals, a monophonic signal hereinafter called mono signal which is here a sum signal.
  • Extraction of spatial information parameters is also performed in block 105.
  • the extracted parameters are as follows.
  • the ICLD InterChannel Level Difference
  • interchannel intensity differences characterize the energy ratios per frequency subband between the left and right channels.
  • L j] and R j] correspond to the spectral (complex) coefficients of the L and R channels
  • the values B [k] and B [k + 1] for each frequency band of index k, define the slot under -bands of the discrete spectrum and the symbol * indicates the complex conjugate.
  • ICTD InterChannel Time Difference
  • the ICC for "InterChannel Coherence" parameters represent inter-channel correlation (or coherence) and are associated with the spatial width of the sound sources; their definition is not recalled here, but it is noted in the article by Breebart et al. that the ICC parameters are not necessary in the subbands reduced to a single frequency coefficient - in fact the amplitude and phase differences completely describe the spatialization in this "degenerate" case.
  • ICLD, ICPD and ICC parameters are extracted by analysis of the stereo signals, by the block 105. If the ICTD parameters were also coded, these could also be extracted by subband from the spectra L [j] and R [ j]; however, the extraction of the ICTD parameters is in general simplified by assuming an identical inter-channel time shift for each sub-band and in this case these parameters can be extracted from the time channels L (n) and R (n) via inter-correlations.
  • the mono signal M [j] is transformed in the time domain (blocks 106 to 108) after short-term Fourier synthesis (inverse FFT, windowing and OverLap-Add or overlay) and a mono coding (block 109) is then realized.
  • the stereo parameters are quantized and coded in block 110.
  • the spectrum of the signals (L j ' J, R j ' J) is divided according to a nonlinear frequency scale of ERB (equivalent Rectangular Bandwidth) or Bark type, with a number of subbands typically ranging from 20 to 34 for a sampled signal of 16 to 48 kHz. This scale defines the values of B [k] and B [k + 1] for each subband k.
  • the parameters (ICLD, ICPD, ICC) are encoded by scalar quantization possibly followed by entropy coding and / or differential coding.
  • the ICLD is encoded by a non-uniform quantizer (ranging from -50 to +50 dB) with differential entropy coding.
  • the non-uniform quantization step exploits the fact that the higher the value of the ICLD, the lower the auditory sensitivity to variations of this parameter.
  • coding For the coding of the mono signal (block 109), several quantization techniques with or without memory are possible, for example coding with “Coded Pulse Modulation” (MIC), its adaptive version called “Adaptive Differential Coded Pulse Modulation” (ADPCM) or more advanced techniques such as transform perceptual coding or Code Excited Linear Prediction (CELP) coding.
  • MIC Coded Pulse Modulation
  • ADPCM Adaptive Differential Coded Pulse Modulation
  • CELP Code Excited Linear Prediction
  • ITU-T Recommendation G.722 which uses ADPCM for Adaptive Differential Pulse Code Modulation (ADPCM).
  • ADPCM Adaptive Differential Pulse Code Modulation
  • the input signal of a G.722-type encoder is in an expanded band with a minimum bandwidth of [50-7000 Hz] with a sampling frequency of 16 kHz.
  • This signal is decomposed into two sub-bands [0-4000 Hz] and [4000-8000 Hz] obtained by decomposition of the signal by Quadrature Mirror Filters (QMF) quadrature mirror filters in English, then each of the subbands is encoded separately by an ADPCM encoder.
  • QMF Quadrature Mirror Filters
  • the low band is coded by a 6, 5 and 4 bit nested code ADPCM coding while the high band is coded by a 2 bit ADPCM coder per sample.
  • the total bit rate is 64, 56 or 48 bit / s depending on the number of bits used for decoding the low band.
  • a quantized signal frame according to the G.722 standard consists of 6, 5 or 4-bit coded quantization indices per low-band sample (0-4000 Hz) and 2 bits per high-band sample (4000-8000 Hz). ). Since the transmission frequency of the scalar indices is 8 kHz in each subband, the bit rate is 64, 56 or 48 kbit / s.
  • the mono signal is decoded (block 201), a de-correlator is used (block 202) to produce two versions M (n) and M '(n) of the decoded mono signal.
  • This decorrelation makes it possible to increase the spatial width of the mono source M (n) and thus to avoid being punctual.
  • These two signals M (n) and M '(n) are passed in the frequency domain (blocks 203 to 206) and the decoded stereo parameters (block 207) are used by the stereo synthesis (or formatting) (block 208) to reconstruct the left and right channels in the frequency domain. These channels are finally reconstructed in the time domain (blocks 209 to 214).
  • the block 105 performs a channel reduction processing or "downmix” by combining the stereo channels (left, right) to obtain a mono signal which is then encoded by a mono encoder.
  • the spatial parameters ICLD, ICPD, ICC, (7) are extracted from the stereo channels and transmitted in addition to the bitstream from the mono encoder.
  • the passive "downmix" which corresponds to a direct matrixing of the stereo channels to combine them into a single signal
  • the compensation parameter can be set as follows:
  • the gains w: w 2 are in general adapted to the short-term signal in particular to align the phases.
  • the phase of the channel L for each frequency sub-band is chosen as the reference phase
  • the channel R is aligned according to the phase of the channel L for each sub-band by the following formula:
  • R '[k] e i ICPDÎb] .R [k] (8)
  • OR l V- L
  • R' [k] is the aligned R-channel
  • k is the index of a coefficient in the b th - frequency band
  • ICPD [b] is the inter-channel phase difference in the b th frequency subband given by:
  • phase alignment therefore conserves energy and avoids attenuation problems by eliminating the influence of the phase.
  • This "downmix” corresponds to the "downmix” described in the document by Breebart et al. or:
  • An ideal conversion of a stereo signal to a mono signal should avoid attenuation problems for all frequency components of the signal.
  • This "downmix" operation is important for parametric stereo coding because the decoded stereo signal is only a spatial shaping of the decoded mono signal.
  • the downmix technique in the frequency domain described above retains the energy level of the stereo signal in the mono signal by aligning the R channel and the L channel before processing. This phase alignment avoids situations where the channels are in phase opposition.
  • the phase of the mono signal after "downmix” becomes constant, and the mono-result signal will generally be bad. quality; similarly, if the reference channel is a random signal (ambient noise, etc.), the phase of the mono signal may become random or be poorly conditioned with again a mono signal which will generally be of poor quality.
  • the amplitude of M [k] is the average of the amplitudes of the L and R channels.
  • the phase of M [k] is given by the phase of the signal summing the two stereo channels (L + R).
  • the method of Hoang et al. preserves the energy of the mono signal as the Samsudin et al. method, and it avoids the problem of total dependence of one of the stereo channels (L or
  • the invention improves the situation of the state of the art.
  • the method proposes a method of parametric coding of a stereo audio signal comprising a step of coding a mono signal resulting from a channel reduction processing applied to the stereo signal and coding signal spatialization information. stereo.
  • the method is such that the channel reduction process comprises the following steps:
  • the channel reduction processing makes it possible to solve at the same time the problems related to the stereo channels in quasi-phase opposition and the problem of possible dependence of the processing on the phase of a reference channel (L or R). Indeed, since this processing involves a modification of one of the stereo channels by rotation of an angle less than the value of the phase difference of the stereo channels (ICPD), to obtain an intermediate channel, it makes it possible to obtain an angular interval adapted to the calculation of a mono signal whose phase (by frequency subband) does not depend on a reference channel. Indeed, the channels thus modified are not aligned in phase.
  • the quality of obtaining the mono signal from the channel reduction processing is improved, especially in the case where the stereo signals are in phase opposition or close to the phase opposition.
  • the mono signal is determined according to the following steps:
  • the intermediate mono signal has a phase that does not depend on a reference channel because the channels from which it is obtained are not aligned in phase.
  • the channels from which the intermediate mono signal is obtained are also not in phase opposition, even if the original stereo channels are, the resulting lower quality problem is solved.
  • the intermediate channel is obtained by rotation of the first predetermined channel by half (ICPD [j] / 2) of the determined phase difference.
  • the spatialization information includes a first information on the amplitude of the stereo channels and a second information on the phase of the stereo channels, the second information comprising, by frequency subband, the phase difference defined between the mono signal and a first predetermined stereo channel.
  • the phase difference between the mono signal and the predetermined stereo channel is a function of the phase difference between the intermediate mono signal and the second channel of the stereo signal.
  • the first predetermined channel is the so-called dominant channel whose amplitude is the highest among the channels of the stereo signal.
  • the dominant channel is determined in the same way to the encoder and the decoder without exchange of information.
  • This dominant channel then serves as a reference for determining the phase differences useful for the channel reduction processing at the encoder or for the synthesis of the stereo signals at the decoder.
  • the first predetermined channel is the so-called dominant channel for which the amplitude of the locally decoded corresponding channel is the highest among the channels of the stereo signal.
  • the determination of the dominant channel is done on locally decoded values to the coding and are therefore identical to those decoded at the decoder.
  • the amplitude of the mono signal is calculated as a function of amplitude values of the locally decoded stereo channels.
  • the amplitude values thus correspond to the real decoded values and make it possible to obtain at decoding a better quality of spatialization.
  • the first piece of information is coded by a first coding layer and the second piece of information is coded by a second coding layer.
  • the present invention also relates to a method for parametric decoding of a stereo audio signal comprising a step of decoding a received mono signal, resulting from a channel reduction processing applied to the original stereo signal and to decoding information.
  • spatialization of the original stereo signal is such that the spatialization information includes a first information on the amplitude of the stereo channels and a second information on the phase of the stereo channels, the second piece of information comprising, by frequency subband, the phase difference defined between the signal mono and a first predetermined stereo channel.
  • the method also comprises the following steps: from the phase difference defined between the mono signal and a first predetermined stereo channel, calculating a phase difference between an intermediate mono channel and the first predetermined channel for a set of frequency subbands;
  • the spatialization information makes it possible to find the phase differences adapted to perform the synthesis of the stereo signals.
  • the signals obtained have a conserved energy compared to the original stereo signals over the entire frequency spectrum, with good quality even for original signals in phase opposition.
  • the first predetermined stereo channel is the so-called dominant channel whose amplitude is the strongest among the channels of the stereo signal.
  • the first information on the amplitude of the stereo channels is decoded by a first decoding layer and the second information is decoded by a second decoding layer.
  • the invention also relates to a parametric encoder of a stereo audio signal comprising a coding module of a mono signal from a channel reduction processing module applied to the stereo signal and information coding modules of spatialization of the stereo signal.
  • the encoder is such that the channel reduction processing module comprises:
  • a parametric decoder of a digital audio signal of a stereo audio signal comprising a decoding module of a received mono signal, resulting from a channel reduction processing applied to the original stereo signal and decoding modules.
  • spatialization information of the original stereo signal is such that the spatialization information includes a first information on the amplitude of the stereo channels and a second information on the phase of the stereo channels, the second piece of information comprising, by frequency subband, the phase difference defined between the signal mono and a first predetermined stereo channel.
  • the decoder comprises:
  • the invention relates to a computer program comprising code instructions for implementing the steps of a coding method according to the invention and / or a decoding method according to the invention.
  • the invention finally relates to a storage means readable by a processor storing a computer program as described.
  • FIG. 1 illustrates an encoder implementing a parametric coding known from the state of the art and previously described
  • FIG. 2 illustrates a decoder implementing a parametric decoding known from the state of the art and previously described
  • FIG. 3 illustrates a stereo parametric encoder according to one embodiment of the invention
  • FIGS. 4a and 4b illustrate, in the form of flowcharts, the steps of a coding method according to alternative embodiments of the invention
  • FIG. 5 illustrates a method of calculating spatialization information in a particular embodiment of the invention
  • FIGS. 6a and 6b illustrate the bitstream of spatialization information coded in a particular embodiment
  • FIGS. 7a and 7b illustrate in one case the non-linearity of the phase of the mono signal in an example of coding not implementing the invention and in the other case in a coding implementing the invention;
  • FIG. 8 illustrates a decoder according to one embodiment of the invention
  • FIG. 9 illustrates a calculation mode according to one embodiment of the invention, phase differences for the synthesis of the stereo signals at the decoder, on the basis of the spatialization information
  • FIGS. 10a and 10b illustrate, in the form of flowcharts, the steps of a decoding method according to alternative embodiments of the invention
  • FIGS. 1a and 11b respectively illustrate a hardware example of a device incorporating an encoder and a decoder able to implement the coding method and the decoding method, according to one embodiment of the invention.
  • This parametric stereo encoder uses a G.722 mono coding at 56 or
  • Each time channel (L (n) and (n)) sampled at 16 kHz is first pre-filtered by a high pass filter (HPF) eliminating components below 50 Hz (blocks 301). and 302).
  • the channels L '(n) and' (n) coming from the pre-filtering blocks are analyzed in frequencies by discrete Fourier transform with overlapping sinusoidal windowing of 50% length 10 ms or 160 samples (blocks 303 to 306).
  • the signal (L '(n), R' (n)) is weighted by a symmetric analysis window covering 2 frames of 5 ms or 10 ms (160 samples).
  • the 10ms analysis window covers the current frame and the future frame.
  • the future frame corresponds to a "future" signal segment commonly called "lookahead" of 5 ms.
  • the index coefficients 0 ⁇ j ⁇ 80 are complex and correspond to a 100 Hz wide subband centered on the frequency of j.
  • Spectra L [j] and R [j] are combined in block 307 described later to obtain a mono (downmix) signal M [j] in the frequency domain.
  • This signal is converted into time by inverse FFT and windowing-overlap with the "lookahead" part of the previous frame (blocks 308-310).
  • a delay of 2 frames must be introduced into the codec-decoder.
  • the delay of 2 frames is specific to the detailed implementation here, in particular it is related to symmetrical sinusoidal windows of 10 ms.
  • This delay could be different.
  • it would be possible to obtain a delay of one frame with an optimized window with a smaller overlap between adjacent windows with a block 311 not introducing a delay (T 0).
  • the block 313 introduces a delay of two frames on the spectra L [j], R [j] and M j] in order to obtain the spectra L buf [j], R bUf [j] and M buf [j].
  • the coding of the stereo spatial information is implemented in the blocks 314 to 316.
  • the stereo parameters are extracted (block 314) and coded (blocks 315 and 316) from the spectrums L [j], R j] and M [j] offset by two frames: L buf [j], R buf lj] and M buf [j].
  • the channel reduction processing block 307 or "downmix" is now described in more detail.
  • the latter performs a "downmix" in the frequency domain to obtain a mono signal M [j].
  • the principle of channel reduction processing is carried out according to steps E400 to E404 or according to steps E410 to E414 illustrated in FIGS. 4a and 4b. These figures show two equivalent variants from a result point of view.
  • a first step E400 determines the phase difference, by frequency line j, between the L and R channels defined in the frequency domain.
  • This phase difference corresponds to the ICPD parameters as described above and defined by the following formula:
  • step E401 a modification of the stereo channel R is performed to obtain an intermediate channel R '.
  • the determination of this intermediate channel is effected by rotating the channel R by an angle obtained by reducing the phase difference determined in step E400.
  • the modification is effected by rotating an angle of ICPD / 2 of the initial channel R to obtain the channel R 'according to the following formula:
  • the phase difference between the two channels of the stereo signal is reduced by half to obtain the intermediate channel R '.
  • the rotation is at a different angle, for example, an angle of 3.ICPD [j] / 4.
  • the phase difference between the two channels of the stereo signal is reduced by 3/4 to obtain the intermediate channel R '
  • step E 402 an intermediate mono signal is calculated from the channels L [J] and R [j]. This calculation is done by frequency coefficient.
  • the amplitude of the intermediate mono signal is obtained by the average of the amplitudes of the intermediate channel R 'and of the channel L and the phase is obtained by the phase of the signal summing the second channel L and the intermediate channel R' (L + R ') according to the following formula:
  • step E403 the phase difference (a '[j]) between the intermediate mono signal and the second channel of the stereo signal, here the channel L, is calculated. This difference is expressed as follows:
  • step E404 determines the mono signal M by rotation of the intermediate mono signal of the angle.
  • the mono signal M is calculated according to the following formula:
  • FIG. 5 illustrates the phase differences mentioned in the method described in FIG. 4a and thus shows the mode of calculating these phase differences.
  • FIG. 4b shows a second variant of the "downmix" method, in which the modification of the stereo channel is performed on the channel L (instead of R) rotated by an angle of -ICPD / 2 (instead of ICPD / 2) to obtain an intermediate channel L '(instead of R').
  • Steps E410 to E414 are not presented here in detail because they correspond to steps E400 to E404 adapted to the fact that the modified channel is no longer R 'but L'.
  • the mono signals M obtained from the channels L and R 'or the channels R and L' are identical.
  • the mono signal M is independent of the stereo channel to be modified (L or R) for a modification angle of ICPD / 2.
  • M j] is directly calculated in the form:
  • the mono signal M can be deduced from the following calculation:
  • the preceding variants have considered different ways of calculating the mono signal according to FIGS. 4a or 4b.
  • the mono signal can be calculated either directly through its amplitude and its phase, or indirectly by rotation of the intermediate mono channel M '.
  • the determination of the phase of the mono signal is made from the phase of the signal summing the intermediate channel and the second stereo signal and from a phase difference between on the one hand the signal summing the intermediate channel and the second channel and secondly the second channel of the stereo signal.
  • the X and Y channels are defined from the locally decoded channels L [j] and R [j] such that
  • I [j] ratio is available at the decoder and at the encoder (by local decoding).
  • the local decoding of the coder is not shown in FIG. 3 for the sake of clarity.
  • the X and Y channels are defined from the original channels L [j] and R [j] such that
  • the mono signal M can be calculated from X and Y by modifying one of the channels (X or Y).
  • the calculation of M from X and Y is deduced from FIGS. 4a and 4b as follows:
  • I [j] represents the amplitude ratio between the decoded L [j] and R [j] channels.
  • the ratio [j] is available to the decoder as to the encoder (by local decoding).
  • the mono signal is calculated by the following formula:
  • step E 402 an intermediate mono signal is calculated from the channels L j] and R [j] with:
  • the mono signal M ' will be calculated as follows:
  • step E 402 This calculation replaces step E 402, while the other steps are preserved (steps 400, 401, 403, 404).
  • step E 412 The difference between this calculation of the intermediate "downmix" M 'and the calculation presented previously resides solely in the amplitude
  • the "downmix" according to the invention differs from the technique of Samsudin et al. in the sense that a channel (L, R or X) is rotated by an angle less than the ICPD value, this rotation angle is obtained by reducing the ICPD by a factor ⁇ 1, the typical value is 1 ⁇ 2 - even if the example of 3 ⁇ 4 was also given without restricting the possibilities.
  • the fact that the factor applied to the ICPD is of value strictly less than 1 makes it possible to qualify the angle of rotation as the result of a "reduction" of the phase difference ICPD.
  • the invention is based on a “downmix” said “intermediate downmix” two essential variants have been presented. This intermediate downmix produces a mono signal whose phase (by frequency line) does not depend on a reference channel (except in the trivial case where one of the stereo channels is zero, which is an extreme case which is not not relevant in the general case).
  • the spectra L bu f [j] and Rbuflj] are divided into 20 sub-frequency bands. These subbands are defined by the following boundaries:
  • ⁇ T L [fc] and ⁇ X s [fc] represent the energy of the left channel (L buf ) and the right channel (R bu f) respectively:
  • the ICLD parameters are coded by differential non-uniform scalar quantization (block 315) on 40 bits per frame. This quantification will not be detailed here because it goes beyond the scope of the invention.
  • phase information for frequencies below 1.5-2 kHz is particularly important for obtaining good stereo quality.
  • the frequency coefficients where the phase information is the most perceptually important are identified, and the associated phases are coded (block 316) by a technique detailed below. after referring to Figures 6a and 6b using a budget of 40 bits per frame.
  • Figures 6a and 6b show the structure of the bitstream for the encoder in a preferred embodiment. It is a hierarchical binary bit structure derived from scalable coding with G.722 coding for core coding.
  • the mono signal is thus encoded by a G.722 coder at 56 or 64 kbit / s.
  • the G.722 core coding operates at 56 kbit / s and a first stereo extension layer (Ext.stereo 1) is added.
  • the G.722 core coding operates at 64 kbit / s and two stereo extension layers (Ext.stereo 1 and Ext.stereo 2) are added.
  • the encoder thus operates according to two possible modes (or configurations):
  • bit stream shown in FIG. 6a includes the information on the amplitude of the stereo channels, for example the ICLD parameters as described above.
  • a 4-bit ICTD parameter is also encoded in the first coding layer.
  • the bit stream shown in Fig. 6b includes both the stereo channel amplitude information in the first extension layer (and a variant ICTD parameter) and the stereo channel phase information in the second layer. extension.
  • the splitting into two extension layers shown in FIGS. 6a and 6b could become generalized in the case where at least one of the two extension layers comprises both a portion of the amplitude information and a portion of the amplitude information. information on the phase.
  • a dominant channel X and a secondary channel Y are determined for each Fourier line of index j from the L channels.
  • î [j] corresponds to the amplitude ratio of the stereo channels, calculated from the ICLD parameters according to the formula:
  • the channels used are the original channels L buf [j] and R buf [j] shifted by a number of frames; since it involves calculating angles, the fact that the amplitude of these channels is the original amplitude or the amplitude decoded locally has no influence.
  • I buf [j] the information for distinguishing between X and Y, so that the encoders and decoders use the same conventions for calculating / decoding the angle ⁇ [.
  • the information [ bu ] [j] is available to the encoder (by local decoding) and offset a number of frames.
  • the decision criterion 1 buf [j] used for the encoding and decoding of e? [J] is therefore identical for the encoder and the decoder.
  • the differentiation between the dominant and secondary channels in the preferred embodiment is motivated mainly by the fact that the fidelity of the stereo synthesis is different depending on whether the angles transmitted by the encoder are X buf [j] or [j] as a function of the ratio of amplitude between L and R.
  • the channels X buf [j], Y buf [j] will not be defined, but one will calculate ⁇ [adaptively as:
  • the coded parameters will be the parameters ⁇ ' [defined by:
  • the ICLD parameters of 20 subbands are encoded by non-uniform scalar quantization (block 315) on 40 bits per frame.
  • the angles ⁇ [are calculated for j 2, .., 9 and encoded by uniform scalar quantization of PI / 16 over 5 bits.
  • the budget allocated to code this phase information is only one particular example of achievement. It can be lower and in this case take into account only one reduced number of frequency lines or on the contrary higher and can allow to code a greater number of frequency lines.
  • these spatialization information on two extension layers is a particular embodiment.
  • the invention is also applicable in the case where this information is coded in a single improvement coding layer.
  • FIGS 7a and 7b now illustrate the advantages that the channel reduction process of the invention can provide over other methods.
  • FIG. 7a illustrates the variation of ⁇ [j] for the channel reduction process described with reference to FIG. 4, as a function of ICLD [J] and j].
  • phase of the mono signal M is quasi-linear as a function of Z.R [j]
  • phase ⁇ [j] of the mono signal is non-linear as a function of Z.R [j];
  • Z.M [j] takes values around 0, PI / 2, or +/- PI according to the values of the parameter
  • ICLD [j] For these signals in phase opposition and close to the phase opposition, the quality of the mono signal may become poor because of the non-linear behavior of the phase of the mono signal ⁇ [j] ⁇
  • the limiting case corresponds to opposite channels
  • the advantage of the invention is to contract the angular interval in order to restrict the calculation of the intermediate mono signal on the interval [-PI 2, PI / 2] for which the phase of the mono signal has a quasi-linear behavior.
  • the mono signal obtained from the intermediate signal then has a linear phase throughout the interval [-PI, PI] even for signals in phase opposition.
  • phase difference OC buf [j] it will be possible to systematically code the phase difference OC buf [j] between the L and M channels, instead of coding ⁇ [; this variant does not distinguish the dominant and secondary channels, it is therefore simpler to achieve but it gives a lower stereo synthesis quality.
  • the decoder can directly decode the angle OC buf [j] between L and M but it will have to "estimate” the angle buf [j] between R and M missing (uncoded); it can be shown that the accuracy of this "estimate" is less good when the L channel is dominant than when the L channel is secondary.
  • the implementation of the encoder presented previously relied on a "downmix” using a reduction of the phase difference ICPD by a factor 1/2.
  • the "downmix” uses another reduction factor ( ⁇ 1), for example of value 3/4, the principle of the coding of the stereo parameters will remain unchanged.
  • the second enhancement layer will comprise the phase difference ( ⁇ [or GC buf [j]) defined between the mono signal and a first predetermined stereo channel.
  • This decoder comprises a demultiplexer 501 in which the coded mono signal is extracted to be decoded at 502 by a G.722 decoder in this example.
  • the portion of the bit stream (scalable) corresponding to G.722 is decoded at 56 or 64 kbit / s depending on the selected mode. It is assumed here that there is no loss of frames or bit errors on the bit stream to simplify the description, however, known frame loss correction techniques can obviously be implemented in the decoder.
  • the decoded mono signal corresponds to M (n) in the absence of channel errors.
  • a short-term discrete Fourier transform analysis with the same windowing as the encoder is performed on M (n) (blocks 503 and 504) to obtain the spectrum M [j]
  • the part of the bit stream associated with the stereo extension is also de-multiplexed. ICLD parameters 505). The details of
  • the amplitudes of the left and right channels are reconstructed (block 507) by applying the decoded ICLD parameters by subband.
  • the amplitudes of the left and right channels are decoded (block 507) by applying the decoded ICLD parameters by subband.
  • the ICLD parameter is coded / decoded by sub-bands and not by frequency band. It is considered here that the frequency lines of index j belonging to the same subband of index k (hence in the interval [B [k], ..., B [k + 1] -l]) have as their ICLD value the ICLD value of the subband.
  • ij] corresponds to the ratio between the two scale factors:
  • This ratio is obtained from the information encoded in the first 8 kbit / s stereo enhancement layer.
  • the associated encodings and decodings are not detailed here, but for a budget of 40 bits per frame it can be considered that this ratio is coded by subband and not frequency line, with a non-uniform subband cut.
  • a 4-bit ICTD parameter is decoded from the first coding layer.
  • FIG. 9 geometrically illustrates the phase differences (angles) decoded according to the invention.
  • the L channel is the secondary channel (Y) and the R channel is the dominant channel (X).
  • Y the secondary channel
  • X the dominant channel
  • the intermediate angle fi ⁇ j ⁇ is defined as the phase difference between M 'and R' as follows:
  • phase difference between M and R is defined by:
  • FIG. 9 would still be valid, but with approximations on the fidelity of the reconstructed L and R channels, and in general a lower quality of stereo synthesis.
  • the spectra R [j] and L [j] are then converted into the time domain by inverse FFT, windowing, addition and overlap (blocks 508 to 513) to obtain the synthesized channels R (n) and L (n).
  • step E1001 the spectrum of the mono signal is M [j] is decoded.
  • the angle a represents the phase difference between a first predetermined channel of the stereo channels, here the L channel and the mono signal.
  • step El 004 an intermediate phase difference ⁇ 'between the second channel of the modified or intermediate stereo signal, here R' and the intermediate mono signal M ', is determined from the calculated phase difference a' and from the information on the amplitude of the stereo channels, decoded in the first extension layer, at block 505 of FIG.
  • step El 005 the phase difference ⁇ between the second channel R and the mono signal M is determined from the intermediate phase difference ⁇ '.
  • steps E1006 and E1007 the synthesis of the stereo signals, by frequency coefficient, is performed from the decoded mono signal and the phase differences determined between the mono signal and the stereo channels.
  • Figure 10b presents the general case where the angle ⁇ [j] adaptively corresponds to the angle â [j] or / # [./].
  • step El 101 the spectrum of the mono signal is M [j] is decoded.
  • the angle [[j] represents the phase difference between a first predetermined channel of the stereo channels (here the secondary channel) and the mono signal.
  • the other phase difference is deduced by exploiting the geometric properties of the downmix used in the invention. Since the downmix can be calculated by indifferently modifying L or R to use a modified channel L 'or R', it is assumed here at the decoder that the decoded mono signal has been obtained by modifying the dominant channel X. Thus we define as to FIG. 9 the intermediate phase difference ( ⁇ 'or ⁇ ') between the secondary channel and the signal intermediate mono M '; this phase difference can be determined from and information on the amplitude [i] of the decoded stereo channels in the first extension layer at block 505 of FIG.
  • step El 111 the phase difference ⁇ between the second channel R and the mono signal M is determined from the intermediate phase difference ⁇ '.
  • step El 112 the synthesis of the stereo signals, by frequency coefficient, is performed from the decoded mono signal and phase differences determined between the mono signal and the stereo channels.
  • the spectra R [j] and £ [J] are thus calculated and then converted into the time domain by inverse FFT, windowing, addition and overlap (blocks 508 to 513) to obtain the synthesized channels R (n) and L (n). .
  • the implementation of the decoder presented previously relied on a "downmix” using a reduction of the phase difference ICPD by a factor 1/2.
  • the "downmix” uses another reduction factor ( ⁇ 1), for example of value 3/4, the principle of the decoding of the stereo parameters will remain unchanged.
  • the second enhancement layer will include the phase difference ( ⁇ [or GC buf [j]) defined between the mono signal and a first predetermined stereo channel.
  • the decoder can deduce the phase difference between the mono signal and the second stereo channel from this information.
  • the encoder presented with reference to FIG. 3 and the decoder presented with reference to FIG. 8 have been described in the case of a particular application of hierarchical coding and decoding.
  • the invention can also be applied in the case where the spatialization information is transmitted and received to the decoder in the same coding layer and for the same bit rate.
  • the invention has been described from a decomposition of stereo channels by discrete Fourier transform.
  • the invention is also applicable to other complex representations, such as for example the Modulated Complex Lapped Transform (MCLT) decomposition combining a modified discrete cosine transform (MDCT) and a discrete modified sinus transform (MDST), as well as the case of Pseudo- Quadrature Mirror Filter (PQMF) filter banks.
  • MCLT Modulated Complex Lapped Transform
  • MDCT modified discrete cosine transform
  • MDST discrete modified sinus transform
  • PQMF Pseudo- Quadrature Mirror Filter
  • the encoders and decoders as described with reference to FIGS. 3 and 8 may be integrated in multimedia equipment of the set-top box type or audio or video content player. They can also be integrated into communication equipment of the mobile phone or communication gateway type.
  • FIG. 11a shows an exemplary embodiment of such an equipment in which an encoder according to the invention is integrated.
  • This device comprises a PROC processor cooperating with a memory block BM having a storage and / or working memory MEM.
  • the memory block may advantageously comprise a computer program comprising code instructions for implementing the steps of the coding method in the sense of the invention, when these instructions are executed by the processor PROC, and in particular the coding steps of a mono signal from a channel reduction processing applied to the stereo signal and spatialization information coding of the stereo signal.
  • the channel reduction processing includes determining for a predetermined set of frequency subbands, a phase difference between two stereo channels, obtaining an intermediate channel by rotating a first channel.
  • the phase of the mono signal from the phase of the signal summing the intermediate channel and the second stereo signal and from a difference of phase between on the one hand the signal summing the intermediate channel and the second channel and on the other hand the second channel of the stereo signal.
  • the program may include the steps implemented to code the information adapted to this treatment.
  • the descriptions of FIGS. 3, 4a, 4b and 5 show the steps of an algorithm of such a computer program.
  • the computer program can also be stored on a memory medium readable by a reader of the device or equipment or downloadable in the memory space thereof.
  • Such equipment or encoder comprises an input module adapted to receive a stereo signal comprising the R and L channels for right and left, either by a communication network, or by reading a content stored on a storage medium.
  • This multimedia equipment may also include means for capturing such a stereo signal.
  • the device comprises an output module capable of transmitting the coded spatial information parameters P c and a mono signal M originating from the coding of the stereo signal.
  • FIG. 11b illustrates an example of multimedia equipment or decoding device comprising a decoder according to the invention.
  • This device comprises a PROC processor cooperating with a memory block BM having a storage and / or working memory MEM.
  • the memory block can advantageously comprise a computer program comprising code instructions for implementing the steps of the decoding method in the sense of the invention, when these instructions are executed by the processor PROC, and in particular the decoding steps of a received mono signal, resulting from a channel reduction processing applied to the original stereo signal and decoding spatialization information of the original stereo signal, the spatialization information including first information on the amplitude of the stereo channels and a second information on the phase of the stereo channels, the second information comprising, by frequency subband, the phase difference defined between the mono signal and a first predetermined stereo channel.
  • the decoding method comprises from the phase difference defined between the mono signal and a first predetermined stereo channel, calculating a phase difference between an intermediate mono channel and the first predetermined channel for a set of frequency sub-bands.
  • FIGS. 8, 9 and 10 repeats the steps of an algorithm of such a computer program.
  • the computer program can also be stored on a memory medium readable by a reader of the device or downloadable in the memory space of the equipment.
  • the device comprises an input module able to receive the coded spatial information parameters P c and a mono signal M coming for example from a communication network. These input signals can come from a reading on a storage medium.
  • the device comprises an output module capable of transmitting a stereo signal, L and R, decoded by the decoding method implemented by the equipment.
  • This multimedia equipment may also include speaker type reproduction means or communication means capable of transmitting this stereo signal.
  • Such multimedia equipment may include both the encoder and the decoder according to the invention.
  • the input signal then being the original stereo signal and the output signal, the decoded stereo signal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP11785726.8A 2010-10-22 2011-10-18 Verbesserte stereoparametrische kodierung/dekodierung für gegenphasige kanäle Withdrawn EP2656342A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1058687A FR2966634A1 (fr) 2010-10-22 2010-10-22 Codage/decodage parametrique stereo ameliore pour les canaux en opposition de phase
PCT/FR2011/052429 WO2012052676A1 (fr) 2010-10-22 2011-10-18 Codage/decodage paramétrique stéréo amélioré pour les canaux en opposition de phase

Publications (1)

Publication Number Publication Date
EP2656342A1 true EP2656342A1 (de) 2013-10-30

Family

ID=44170214

Family Applications (1)

Application Number Title Priority Date Filing Date
EP11785726.8A Withdrawn EP2656342A1 (de) 2010-10-22 2011-10-18 Verbesserte stereoparametrische kodierung/dekodierung für gegenphasige kanäle

Country Status (7)

Country Link
US (1) US9269361B2 (de)
EP (1) EP2656342A1 (de)
JP (1) JP6069208B2 (de)
KR (1) KR20140004086A (de)
CN (1) CN103329197B (de)
FR (1) FR2966634A1 (de)
WO (1) WO2012052676A1 (de)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8768175B2 (en) * 2010-10-01 2014-07-01 Nec Laboratories America, Inc. Four-dimensional optical multiband-OFDM for beyond 1.4Tb/s serial optical transmission
EP2702776B1 (de) * 2012-02-17 2015-09-23 Huawei Technologies Co., Ltd. Parametrischer kodierer zur kodierung eines mehrkanal-audiosignals
TWI713018B (zh) 2013-09-12 2020-12-11 瑞典商杜比國際公司 多聲道音訊系統中之解碼方法、解碼裝置、包含用於執行解碼方法的指令之非暫態電腦可讀取的媒體之電腦程式產品、包含解碼裝置的音訊系統
EP3767970B1 (de) * 2013-09-17 2022-09-28 Wilus Institute of Standards and Technology Inc. Verfahren und vorrichtung zur verarbeitung von multimediasignalen
KR102160254B1 (ko) * 2014-01-10 2020-09-25 삼성전자주식회사 액티브다운 믹스 방식을 이용한 입체 음향 재생 방법 및 장치
FR3020732A1 (fr) * 2014-04-30 2015-11-06 Orange Correction de perte de trame perfectionnee avec information de voisement
EP3353779B1 (de) 2015-09-25 2020-06-24 VoiceAge Corporation Verfahren und system zur codierung eines stereotonsignals unter verwendung von codierungsparametern eines primärkanals zur codierung eines sekundärkanals
FR3045915A1 (fr) * 2015-12-16 2017-06-23 Orange Traitement de reduction de canaux adaptatif pour le codage d'un signal audio multicanal
ES2768052T3 (es) 2016-01-22 2020-06-19 Fraunhofer Ges Forschung Aparatos y procedimientos para codificar o decodificar una señal de audio multicanal usando sincronización de control de trama
FR3048808A1 (fr) * 2016-03-10 2017-09-15 Orange Codage et decodage optimise d'informations de spatialisation pour le codage et le decodage parametrique d'un signal audio multicanal
EP3246923A1 (de) * 2016-05-20 2017-11-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und verfahren zur verarbeitung eines multikanal-audiosignals
CA3045847C (en) * 2016-11-08 2021-06-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Downmixer and method for downmixing at least two channels and multichannel encoder and multichannel decoder
MY196198A (en) * 2016-11-08 2023-03-22 Fraunhofer Ges Forschung Apparatus and Method for Downmixing or Upmixing a Multichannel Signal Using Phase Compensation
CN108269577B (zh) * 2016-12-30 2019-10-22 华为技术有限公司 立体声编码方法及立体声编码器
US10366695B2 (en) * 2017-01-19 2019-07-30 Qualcomm Incorporated Inter-channel phase difference parameter modification
CN114898761A (zh) 2017-08-10 2022-08-12 华为技术有限公司 立体声信号编解码方法及装置
CN117292695A (zh) * 2017-08-10 2023-12-26 华为技术有限公司 时域立体声参数的编码方法和相关产品
CN109389985B (zh) 2017-08-10 2021-09-14 华为技术有限公司 时域立体声编解码方法和相关产品
CN109389984B (zh) 2017-08-10 2021-09-14 华为技术有限公司 时域立体声编解码方法和相关产品
GB201718341D0 (en) 2017-11-06 2017-12-20 Nokia Technologies Oy Determination of targeted spatial audio parameters and associated spatial audio playback
US10306391B1 (en) 2017-12-18 2019-05-28 Apple Inc. Stereophonic to monophonic down-mixing
EP3550561A1 (de) * 2018-04-06 2019-10-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Downmixer, audio-codierer, verfahren und computerprogramm zur anwendung eines phasenwertes auf einen betragswert
GB2572650A (en) 2018-04-06 2019-10-09 Nokia Technologies Oy Spatial audio parameters and associated spatial audio playback
GB2574239A (en) 2018-05-31 2019-12-04 Nokia Technologies Oy Signalling of spatial audio parameters
CN112233682A (zh) * 2019-06-29 2021-01-15 华为技术有限公司 一种立体声编码方法、立体声解码方法和装置
CN111200777B (zh) * 2020-02-21 2021-07-20 北京达佳互联信息技术有限公司 信号处理方法及装置、电子设备和存储介质
KR102290417B1 (ko) * 2020-09-18 2021-08-17 삼성전자주식회사 액티브다운 믹스 방식을 이용한 입체 음향 재생 방법 및 장치
KR102217832B1 (ko) * 2020-09-18 2021-02-19 삼성전자주식회사 액티브다운 믹스 방식을 이용한 입체 음향 재생 방법 및 장치

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19959156C2 (de) * 1999-12-08 2002-01-31 Fraunhofer Ges Forschung Verfahren und Vorrichtung zum Verarbeiten eines zu codierenden Stereoaudiosignals
EP1479071B1 (de) * 2002-02-18 2006-01-11 Koninklijke Philips Electronics N.V. Parametrische audiocodierung
DE60311794C5 (de) * 2002-04-22 2022-11-10 Koninklijke Philips N.V. Signalsynthese
JP2005143028A (ja) * 2003-11-10 2005-06-02 Matsushita Electric Ind Co Ltd モノラル信号再生方法及び音響信号再生装置
CA2572805C (en) * 2004-07-02 2013-08-13 Matsushita Electric Industrial Co., Ltd. Audio signal decoding device and audio signal encoding device
US7751572B2 (en) * 2005-04-15 2010-07-06 Dolby International Ab Adaptive residual audio coding
JP4479644B2 (ja) * 2005-11-02 2010-06-09 ソニー株式会社 信号処理装置および信号処理方法
US7965848B2 (en) * 2006-03-29 2011-06-21 Dolby International Ab Reduced number of channels decoding
KR101453732B1 (ko) * 2007-04-16 2014-10-24 삼성전자주식회사 스테레오 신호 및 멀티 채널 신호 부호화 및 복호화 방법및 장치
US8385556B1 (en) * 2007-08-17 2013-02-26 Dts, Inc. Parametric stereo conversion system and method
WO2009046909A1 (en) * 2007-10-09 2009-04-16 Koninklijke Philips Electronics N.V. Method and apparatus for generating a binaural audio signal
KR101444102B1 (ko) * 2008-02-20 2014-09-26 삼성전자주식회사 스테레오 오디오의 부호화, 복호화 방법 및 장치
JP5122681B2 (ja) * 2008-05-23 2013-01-16 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ パラメトリックステレオアップミクス装置、パラメトリックステレオデコーダ、パラメトリックステレオダウンミクス装置、及びパラメトリックステレオエンコーダ
EP2144229A1 (de) * 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Effiziente Nutzung von Phaseninformationen beim Audio-Codieren und -Decodieren
US8233629B2 (en) * 2008-09-04 2012-07-31 Dts, Inc. Interaural time delay restoration system and method
EP2214162A1 (de) * 2009-01-28 2010-08-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Aufwärtsmischer, Verfahren und Computerprogramm zur Aufwärtsmischung eines Downmix-Tonsignals

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2012052676A1 *

Also Published As

Publication number Publication date
CN103329197B (zh) 2015-11-25
JP6069208B2 (ja) 2017-02-01
US9269361B2 (en) 2016-02-23
WO2012052676A1 (fr) 2012-04-26
US20130262130A1 (en) 2013-10-03
JP2013546013A (ja) 2013-12-26
FR2966634A1 (fr) 2012-04-27
KR20140004086A (ko) 2014-01-10
CN103329197A (zh) 2013-09-25

Similar Documents

Publication Publication Date Title
WO2012052676A1 (fr) Codage/decodage paramétrique stéréo amélioré pour les canaux en opposition de phase
EP2374123B1 (de) Verbesserte codierung von mehrkanaligen digitalen audiosignalen
EP3427260B1 (de) Optimierte codierung und decodierung von verräumlichungsinformationen zur parametrischen codierung und decodierung eines mehrkanaligen audiosignals
EP2374124B1 (de) Verwaltete codierung von mehrkanaligen digitalen audiosignalen
EP2002424B1 (de) Vorrichtung und verfahren zur skalierbaren kodierung eines mehrkanaligen audiosignals auf der basis einer hauptkomponentenanalyse
EP2489039B1 (de) Optimierte parametrische codierung/decodierung mit niedrigem durchsatz
EP2691952B1 (de) Zuweisung von bits anhand von subbändern zur quantifizierung von rauminformationsparametern für parametrische codierung
EP2452337B1 (de) Zuweisung von bits bei einer verstärkten codierung/decodierung zur verbesserung einer hierarchischen codierung/decodierung digitaler tonsignale
WO2017103418A1 (fr) Traitement de réduction de canaux adaptatif pour le codage d'un signal audio multicanal
EP2005420A1 (de) Einrichtung und verfahren zur codierung durch hauptkomponentenanalyse eines mehrkanaligen audiosignals
EP2304721A1 (de) Raumsynthese mehrkanaliger tonsignale
FR2947944A1 (fr) Codage/decodage perfectionne de signaux audionumeriques
WO2011073600A1 (fr) Codage/decodage parametrique stereo avec optimisation du traitement de reduction des canaux
EP2489040A1 (de) Optimierte parametrische stereodecodierung
EP4042418B1 (de) Bestimmung von korrekturen zur anwendung auf ein mehrkanalaudiosignal, zugehörige codierung und decodierung
FR2980620A1 (fr) Traitement d'amelioration de la qualite des signaux audiofrequences decodes

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20130523

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

INTG Intention to grant announced

Effective date: 20170726

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20171206