EP3391370A1 - Adaptive kanalreduktionsverarbeitung zur codierung eines mehrkanalaudiosignals - Google Patents

Adaptive kanalreduktionsverarbeitung zur codierung eines mehrkanalaudiosignals

Info

Publication number
EP3391370A1
EP3391370A1 EP16825835.8A EP16825835A EP3391370A1 EP 3391370 A1 EP3391370 A1 EP 3391370A1 EP 16825835 A EP16825835 A EP 16825835A EP 3391370 A1 EP3391370 A1 EP 3391370A1
Authority
EP
European Patent Office
Prior art keywords
signal
multichannel
channels
reduction processing
indicator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP16825835.8A
Other languages
English (en)
French (fr)
Inventor
Bertrand FATUS
Stéphane RAGOT
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
Orange SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orange SA filed Critical Orange SA
Publication of EP3391370A1 publication Critical patent/EP3391370A1/de
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the present invention relates to the field of coding / decoding of digital signals.
  • the coding and decoding according to the invention is particularly suitable for the transmission and / or storage of digital signals such as audio-frequency signals (speech, music or other).
  • the present invention relates to parametric encoding or processing of multichannel audio signals, e.g. stereophonic signals hereinafter referred to as stereo signals.
  • This type of coding is based on the extraction of spatial information parameters so that at decoding, these spatial characteristics can be reconstructed for the listener, in order to recreate the same spatial image as in the original signal.
  • Such a parametric coding / decoding technique is for example described in the document by J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, titled "Parametric Coding of Stereo Audio” in EURASIP Journal on Applied Signal Processing 2005 : 9, pp. 1305-1322. This example is repeated with reference to FIGS. 1 and 2 respectively describing an encoder and a parametric stereo decoder.
  • Figure 1 describes a stereo encoder receiving two audio channels, a left channel (denoted L for Left in English) and a right channel (noted R for Right in English).
  • the time signals L (n) and R (n), where n is the entire index of the samples, are processed by the blocks 101, 102, 103 and 104 which perform a short-term Fourier analysis.
  • the transformed signals L [k] and R [k], where k is the integer index of the frequency coefficients, are thus obtained.
  • Block 105 performs a channel reduction processing or "downmix" in English to obtain in the frequency domain from the left and right signals, a monophonic signal hereinafter called mono signal.
  • Extraction of spatial information parameters is also performed in block 105.
  • the extracted parameters are as follows.
  • the ICLD InterChannel Level Difference
  • interchannel intensity differences characterize the energy ratios per frequency subband between the left and right channels.
  • each frequency band of index b comprises the frequency lines in the interval [k b , k b + 1 - 1 ] and the symbol * indicates the complex conjugate.
  • ICTD InterChannel Time Difference
  • the ICC for "InterChannel Coherence" parameters represent the inter-channel correlation (or coherence) and are associated with the spatial width of the data. sound sources; their definition is not recalled here, but it is noted in the article by Breebart et al. that the ICC parameters are not necessary in the subbands reduced to a single frequency coefficient - in fact the amplitude and phase differences completely describe the spatialization in this "degenerate" case.
  • ICLD, ICPD and ICC are extracted by analysis of the stereo signals, by the block 105. If the parameters ICTD or ITD were also coded, these could also be extracted by subband from the spectra L [k] and R [k]; however, the extraction of the ITD parameters is in general simplified by assuming an identical inter-channel time shift for each sub-band and in this case a parameter can be extracted from the time channels L (n) and R (n) through inter-correlations.
  • the mono signal M [k] is transformed in the time domain (blocks 106 to 108) after short-term Fourier synthesis (inverse FFT, windowing and OverLap-Add or overlay) and a mono coding (block 109) is then realized.
  • the stereo parameters are quantized and coded in block 110.
  • the spectrum of the signals (L [k], R [k]) is divided according to a nonlinear frequency scale of ERB (equivalent Rectangular Bandwidth) or Bark type, with a number of subbands typically ranging from 20 to 34. for a sampled signal from 16 to 48 kHz according to the Bark scale. This scale defines the values of k b and k b + 1 for each subband b.
  • the parameters (ICLD, ICPD, ICC, ITD) are encoded by scalar quantization possibly followed by entropy coding and / or differential coding.
  • the ICLD is encoded by a non-uniform quantizer (ranging from from -50 to +50 dB) with differential entropy coding.
  • the non-uniform quantization step exploits the fact that the higher the value of the ICLD, the lower the auditory sensitivity to variations of this parameter.
  • coding For the coding of the mono signal (block 109), several quantification techniques with or without memory are possible, for example coding with “Coded Pulse Modulation” (MIC), its version with adaptive prediction called “Adapted differential pulse coded modulation”. "(ADPCM) or more advanced techniques such as transform perceptual coding or Code Excited Linear Prediction (CELP) coding or multi-mode coding.
  • MIC Coded Pulse Modulation
  • ADPCM Adapted differential pulse coded modulation
  • CELP Code Excited Linear Prediction
  • EVS Enhanced Voice Services
  • the input signal of the EVS codec is sampled at the frequency of 8, 16, 32 or 48 kHz and the codec may represent audio telephony bands (narrowband, NB), wideband (WB), super-wideband (superband) wideband, SWB) or full band (fullband, FB).
  • the rates of the EVS codec are divided into two modes:
  • VBR Variable rate mode
  • DTX discontinuous transmission mode
  • SID frames SID Primary or SID AMR-WB IO
  • the mono signal is decoded (block 201), a de-correlator is used (block 202) to produce two versions M (n) and M '(n) of the decoded mono signal.
  • This decorrelation necessary only when the ICC parameter is used, makes it possible to increase the spatial width of the mono source (n).
  • These two signals M (n) and M '(n) are passed in the frequency domain (blocks 203 to 206) and the decoded stereo parameters (block 207) are used by the stereo synthesis (or formatting) (block 208) to reconstruct the left and right channels in the frequency domain. These channels are finally reconstructed in the time domain (blocks 209 to 214).
  • the block 105 performs a channel reduction processing or "downmix” by combining the stereo channels (left, right) to obtain a mono signal which is then encoded by a mono encoder.
  • the spatial parameters ICLD, ICPD, ICC, (7) are extracted from the stereo channels and transmitted in addition to the bitstream from the mono encoder.
  • the passive "downmix" which corresponds to a direct matrixing of the stereo channels to combine them in a single signal - the coefficients of the matrix of donwmix are generally real and of predetermined values (fixed);
  • M [k] Y [k] L [k] + 2 R [k] (5)
  • k corresponds to the index of a frequency coefficient (Fourier coefficient for example representing a frequency subband).
  • the phase of the channel L for each frequency sub-band is chosen as the reference phase
  • the channel R is aligned according to the phase of the channel L for each sub-band by the following formula:
  • An ideal conversion of a stereo signal to a mono signal should avoid attenuation problems for all frequency components of the signal.
  • This "downmix" operation is important for parametric stereo coding because the decoded stereo signal is only a spatial shaping of the decoded mono signal.
  • the downmix technique in the frequency domain described above retains the energy level of the stereo signal in the mono signal by aligning the R channel and the L channel before processing. This phase alignment avoids situations where the channels are in phase opposition.
  • the amplitude of M [k] is the average of the amplitudes of the L and R channels.
  • the phase of M [k] is given by the phase of the signal summing the two stereo channels (L + R).
  • the method of Hoang et al. preserves the energy of the mono signal like the Samsudin et al. method, and it avoids the problem of total dependence of one of the stereo channels (L or R) for the phase calculation z [/ c].
  • L or R stereo channels
  • it presents a disadvantage when the canals L and R are in quasi-phase opposition in some subbands (with the extreme case L -R). Under these conditions, the resulting mono signal will be of poor quality.
  • this method does not directly take into account the phase changes that can appear in successive frames which can possibly cause phase jumps.
  • the invention improves the situation of the state of the art.
  • the method makes it possible to obtain a channel reduction processing that is adequate for the multichannel signal to be coded, especially when the channels of this signal are in opposition to each other. phase.
  • the adaptation of the downmix being performed per frequency unit, that is to say by frequency sub-band or by frequency line, this makes it possible to adapt to the fluctuations of the multichannel signal of a frame to the 'other.
  • the method furthermore comprises the determination of a phase indicator, representative of a measurement of degree of phase opposition between the channels of the multichannel signal, and that one of the modes of processing of channel reduction of said set depends on the value of the phase indicator.
  • a particular downmix processing is thus performed for the signals whose channels are in phase opposition.
  • This treatment is implemented in a manner adapted to the fluctuation of the signal over time.
  • the set of channel reduction processing modes includes a plurality of processing in the following list:
  • phase-indicator-type hybrid channel reduction processing representative of a phase opposition degree measurement between the multichannel signal channels
  • the indicator characterizing the channels of the multichannel audio signal is a correlation measurement indicator between the channels of the multichannel audio signal.
  • This indicator makes it possible to adapt the channel reduction processing to the channel correlation characteristics of the multichannel audio signal.
  • the determination of this indicator is simple to implement and the quality of the downmix is improved.
  • the indicator characterizing the channels of the multichannel audio signal is a phase indicator, representative of a measure of degree of phase opposition between the multichannel signal channels.
  • This indicator makes it possible to adapt the channel reduction processing to the phase characteristics of the channels of the multichannel audio signal and in particular to the signals which have channels in phase opposition.
  • the invention relates to a device for parametric coding of a multichannel digital audio signal comprising an encoder able to encode a mono signal coming from a channel reduction processing module applied to the multichannel signal and a quantization module for encoding spatialisation information of the multichannel signal.
  • the device is remarkable in that the channel reduction processing module comprises:
  • an extraction module capable of obtaining at least one indicator characterizing the channels of the multichannel digital audio signal, per spectral unit of the multichannel signal
  • a selection module capable of selecting, by spectral unit of the multichannel signal, from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator characterizing the channels; multichannel audio signal.
  • the invention also applies to a method of processing a decoded multichannel audio signal comprising channel reduction processing to obtain a mono signal to be restored.
  • the method is remarkable in that the channel reduction processing comprises the following steps, implemented per spectral unit of the multichannel signal:
  • the method makes it possible to carry out a downmix processing adapted to the signal received, in a simple manner.
  • the processing method further comprises determining a phase indicator representative of a phase opposition degree measurement between the channels of the multichannel signal and that one of the modes of channel reduction processing of said set depends on the value of the phase indicator.
  • a particular downmix processing is thus performed for the decoded signals whose channels are in phase opposition.
  • This treatment is implemented in a manner adapted to the fluctuation of the signal over time.
  • the set of channel reduction processing modes includes a plurality of processing in the following list:
  • phase-indicator-type hybrid channel reduction processing representative of a phase opposition degree measurement between the multichannel signal channels
  • the indicator characterizing the channels of the multichannel audio signal is a correlation measurement indicator between the channels of the multichannel audio signal.
  • This indicator is used to adapt the channel reduction processing to the channel correlation characteristics of the decoded multichannel audio signal.
  • the determination of this indicator is simple to implement and the quality of the downmix is improved.
  • the indicator characterizing the channels of the multichannel audio signal is a phase indicator, representative of a measure of degree of phase opposition between the multichannel signal channels.
  • This indicator makes it possible to adapt the channel reduction processing to the phase characteristics of the channels of the multichannel audio signal and in particular to the signals which have channels in phase opposition.
  • the invention also relates to a device for processing a decoded multichannel audio signal comprising a channel reduction processing module for obtaining a mono signal to be reproduced, which is remarkable in that the channel reduction processing module comprises:
  • an extraction module able to obtain at least one indicator characterizing the channels of the multichannel digital audio signal, per spectral unit of the multichannel signal
  • a selection module capable of selecting, by spectral unit of the multichannel signal, from among a set of channel reduction processing modes, a channel reduction processing mode according to the value of the at least one indicator characterizing the channels; multichannel audio signal.
  • This device has the same advantages as the method described above that it implements.
  • the invention relates to a computer program comprising code instructions for implementing the steps of an encoding method according to the invention, when these instructions are executed by a processor.
  • the invention finally relates to a storage medium readable by a processor on which is recorded a computer program comprising code instructions for performing the steps of the method as described.
  • FIG. 1 illustrates an encoder implementing a parametric coding known from the state of the art and previously described
  • FIG. 2 illustrates a decoder implementing a parametric decoding known from the state of the art and previously described
  • FIG. 3 illustrates a stereo parametric encoder according to one embodiment of the invention
  • FIGS. 4a, 4b, 4c, 4d, 4e and 4f illustrate in flowchart form the steps of the channel reduction processing according to different embodiments of the invention
  • FIG. 5 illustrates an example of evolution of an indicator characterizing the channels of a given multichannel signal used according to one embodiment of the invention, for a given signal
  • FIG. 6 illustrates an example of possible weightings as a function of the value of an indicator characterizing the channels of a signal according to one embodiment of the invention
  • FIG. 7 illustrates a stereo parametric decoder implementing a decoding adapted to the signals coded according to the coding method of the invention
  • FIG. 8 illustrates a device for processing a decoded audio signal in which a channel reduction processing according to the invention is carried out
  • FIG. 9 illustrates a hardware example of a device incorporating an encoder able to implement the coding method, according to one embodiment of the invention.
  • This figure shows both the entities, hardware modules or software driven by a processor of the coding device and the steps implemented by the coding method according to one embodiment of the invention.
  • the invention applies similarly to other types of mono coding (eg IETF OPUS, ITU-T G.722) operating at identical or different sampling rates.
  • mono coding eg IETF OPUS, ITU-T G.722
  • Each time channel (L (n) and R (n)) sampled at 16 kHz is first pre-filtered by a High Pass Filter (HPF) typically eliminating components below 50 Hz ( blocks 301 and 302).
  • HPF High Pass Filter
  • This pre-filtering is optional, but it can be used to avoid DC bias in estimating parameters such as ICTD or ICC.
  • the channels L '(n) and ff' (n) coming from pre-filtering blocks are analyzed in frequencies by discrete Fourier transform with overlapping sinusoidal windowing of 50% length 40 ms or 640 samples (blocks 303 to 306) .
  • the 40ms analysis window covers the current frame and the future frame.
  • the future frame corresponds to a "future" signal segment commonly called "lookahead" of 20 ms.
  • other windows may be used, for example an asymmetrical low-delay window called "ALDO" in the EVS codec.
  • the analysis windowing can be made adaptive according to the current frame, in order to use an analysis with a long window on stationary segments and an analysis with short windows on transitional / non-transitory segments. stationary, possibly with transition windows between long and short windows.
  • the coefficients of index 0 ⁇ k ⁇ 160 are complex and correspond to a 25 Hz sub-band centered on the frequency of k.
  • the spectra L [k] and R [k] are combined in the block 307 described later to obtain a mono (downmix) signal M [k] in the frequency domain.
  • This signal is converted in time by inverse FFT and windowing-recovery with the "lookahead" part of the previous frame (blocks 308 to 310).
  • the lookahead for the calculation of the mono signal (20 ms) and the mono coding / decoding delay to which is added the delay T to align the mono synthesis (20 ms) correspond to an additional delay of 2 frames (40 ms) compared to the current frame.
  • the shifted mono signal is then coded (block 312) by the mono EVS encoder, for example at a rate of 13.2, 16.4 or 24.4 kbit / s.
  • the coding may be performed directly on the non-shifted signal; in this case the shift can be performed after decoding.
  • the block 313 introduces a delay of two frames on the spectra L [k], R [k] and M [k] in order to obtain the spectra L bU f [k], R bU f [k] and M bU f [k].
  • the coding of the stereo spatial information is implemented in the blocks 314 to 317.
  • the stereo parameters are extracted (block 314) and coded (blocks 315 to 317) from the spectrums L [k], R [k] and M [k] offset by two frames: L bU f [k], R bU f [k] and M bU f [k].
  • the channel reduction processing block 307 or "downmix" is now described in more detail.
  • This processing unit 307 comprises a module for obtaining 307a of at least one indicator characterizing the channels of the multichannel signal, here the stereo signal.
  • the indicator may for example be an interchannel correlation type indicator or an indicator of degree of phase opposition between channels. Obtaining these indicators will be described later.
  • the selection block 307b from among a set of downmix processing modes, selects a downmix processing mode which is applied at 307c to the input signals, here to the stereo signal L [ k], R [k] to give a mono signal [/ c].
  • FIGS 4a to 4f illustrate various embodiments implemented by the processing block 307.
  • the parameter ICPD [k] is calculated in the current frame for each frequency line k according to the formula:
  • This parameter corresponds to the phase difference between the L and R channels. It is used here to define the ICCr parameter.
  • N FFT the length of the FFT
  • the complex module may not be applied, but in this case the use of the ICCp parameter (or its derivatives) must take into account the signed value of this parameter.
  • This parameter may optionally be smoothed to mitigate temporal variations. If the current frame is of index m, this smoothing can be calculated with a filter MA (at Adjusted Average) of order 2:
  • the ICCr parameter will be used to designate ICCr [m] (without mentioning the index of the current frame); if the smoothing is not applied, the ICCr parameter will correspond directly to ICCp.
  • other smoothing methods may be implemented, for example using an AR (autoregressive) filter, smoothing the signals.
  • the ICCr parameter quantifies the level of correlation between the L and R channels when the phase differences between these channels are ignored.
  • the parameter ICCp can be defined by subband simply by changing the bounds of the sums, as follows:
  • k b ... k b + 1 - 1 represent the indices of the frequency lines in the subband of index b.
  • the ICCp [b] parameter can be smoothed and in this case the invention will be implemented in the following way: instead of having a single comparison to ICCr [m], there will be as many comparisons to ICCp [b] that there are subbands of index b.
  • the dominant channel is also identified for use as a phase reference.
  • this dominant channel can be determined via an SGN sign parameter calculated for the current frame as the sign of the difference in channel levels.
  • the condition for authorizing a phase reference switching can be defined by frequency line and depend on the type of downmix used to the current frame (of index m) and the type of downmix used to the previous frame (of index m-1); indeed, if the dowmix for the line of index k in the frame m-1 was of the passive type (with compensation of gain) and if the downmix selected at the frame m is a downmix with alignment on an adaptive phase reference, in this case it will be possible to authorize a phase reference switching.
  • the phase reference switch is forbidden for the index line k as long as the downmix explicitly uses the phase reference corresponding to the parameter SGN.
  • the SGN sign parameter [m] therefore only changes its value when ICCr is below a threshold (in the preferred embodiment). This precaution avoids changing the phase reference in areas where the channels are highly correlated and potentially in phase opposition.
  • another criterion may be used to define the phase reference switching conditions.
  • the binary decision associated with the calculation of SGN d may be stabilized to avoid potentially rapid fluctuations. It will thus be possible to define a tolerance, for example of +/- 3 dB, on the value of the level of the channels L and R, in order to implement a hysteresis preventing the change of reference of phase if the tolerance is not exceeded. It will also be possible to apply inter-frame smoothing on the value of the signal level.
  • the parameter SGN d can be calculated with another definition of the level of the channels, for example:
  • the ISD value becomes arbitrarily large.
  • the division in the calculation of the ISD parameter can be avoided because ISD is then compared to a threshold; it is common to add a non-zero low value to the denominator to avoid a division by zero, this precaution is here unnecessary because in the embodiments of the invention this division is not implemented.
  • the comparison ISD [k]> thO is equivalent to the comparison ⁇ L [k] - R [k] ⁇ > thO. ⁇ L [k] + R [k] ⁇ , which makes the downmix mode selection process attractive in terms of complexity.
  • FIG. 4a illustrates the steps implemented for the channel reduction processing of block 307.
  • an indicator characterizing the channels of the multichannel audio signal is obtained.
  • it is the ICCr parameter as defined above, calculated from the ICPD parameter.
  • the ICCr indicator corresponds to a correlation measurement between the channels of the multichannel signal, in the particular case here between the channels of the stereo signal.
  • the choice of the downmix depends mainly on the ICCr [m] indicator calculated as previously explained from the L and R channels of the current frame and of any smoothing.
  • the choice between downmix processing modes is based on the value of the ICCr [m] indicator.
  • This downmix - / c] is defined as a sum sign with equalization of energy in the form:
  • This downmix is effective for stereo signals (and their frequency decompositions by line or subbands) whose channels are not highly correlated and do not have a complex phase relationship. Since it is not used for problematic signals where the gain y [k] could take large arbitrary values, no limitation of the gain is used here, however in variants a limitation of the amplification could be implemented.
  • this equalization by the gain y [k] may be different. For example it would be possible to take the already quoted value:
  • the advantage of the gain y [k] is that it provides the same level of amplitude for the downmix - / c] as for the other downmixes used. It is therefore preferable to adjust the gain y [k] to ensure a level of amplitude or homogeneous energy between the different downmixes.
  • phase of this downmix can also be expressed in an equivalent way as:
  • This downmix is similar to the downmix proposed by the above Samsudin method, however here the reference phase is not given by the L channel and the phase is determined line by line and not at a frequency band.
  • the phase is here set according to the dominant channel identified by the parameter SGN.
  • This downmix is interesting for highly correlated signals, for example for sound signals with AB or binaural type microphones. It may also happen that independent channels have a fairly strong correlation even if it is not the same signal recorded in the L and R channels; to avoid inadvertent switching of the phase reference, it is preferable to allow such switching only when the signals do not present a risk of generating audio artifacts when this downmix is used. This explains the constraint ICCr [m] ⁇ 0.4 in the SGN [m] parameter calculation when the phase reference switching condition uses this criterion. 3. Hybrid downmix between a passive downmix (with gain compensation) and a dowmix with alignment on an adaptive phase reference, depending on an indicator of degree of phase opposition between the channels (ISD [k], such as defined above).
  • This downmix is applied here in cases where the signals are moderately correlated and where they are potentially in phase opposition.
  • the ISD parameter [k] is used here to detect a phase relation close to the phase opposition, and in this case it is preferable to select the downmix with alignment on an adaptive phase reference M 3 [k]; in the opposite case the passive dowmix with gain compensation M [k] is sufficient.
  • the downmix M 2 [k] corresponds to either M [k] or M 3 [k], depending on the value of the ISD parameter [k]. It will be understood that in variants of the invention, it will therefore be possible not to explicitly define this downmix M 2 [k] but to combine the decisions on the downmix selection and the criterion on ISD [k]. Such an example is given in Figure 4c however it is clear that this example applies of course to all embodiments presented here.
  • step E401 if in step E401, the indicator is lower than a first threshold th1, then a first downmix processing mode M1 is implemented in step E402.
  • step E403 If in step E403, the indicator is less than a second threshold th2, then a second downmix processing mode according to M1 and M2 is implemented in step E404.
  • step E405 the indicator is greater than the third threshold th3, then a fourth downmix processing mode M3 is implemented in step E407.
  • the threshold values th1, th2, th3 may be set to other values; the values given here typically correspond to a frame length of 20 ms.
  • weighting of the functions of combinations fl. ,. ) and 2 (.,.) are shown in Figure 6. These combination functions perform a "cross-fade" between different downmixes to avoid threshold effects, that is, too steep transitions between them. respective downmix from one frame to another for a given line. Any weighting functions having complementary values between 0 and 1 are suitable within the defined range, but in the embodiment these functions are derived from the function:
  • the parameter ICCr [m] is here defined at the level of the current frame; in variants this parameter can be estimated by frequency band (for example according to the ERB or Bark scale).
  • FIG. 4b illustrates the steps implemented for the channel reduction processing of block 307.
  • This variant embodiment is intended to simplify the decision on the downmix method to be used and to reduce the complexity. by not fading between two downmix methods.
  • Steps E400, E401, E402, E405 and E407 are identical to those described with reference to FIG. 4a.
  • step E401 if in step E401, the indicator is lower than a first threshold th1, then a first downmix processing mode M1 is implemented in step E402.
  • step E405 If in step E405, the indicator is below a threshold th3, then a second downmix processing mode M2 is implemented in step E410.
  • step E405 the indicator is greater than threshold th3, then a third downmix M3 processing mode is implemented in step E407.
  • downmix M1, M2 and M3 are for example those described above.
  • the downmix M2 is a hybrid downmix between the downmix Ml and M3 which involves another decision criterion on another indicator ISD as defined above.
  • FIG. 4c A strictly identical embodiment in terms of the result of FIG. 4b is shown in FIG. 4c.
  • the evaluation of the selection parameters (block E450) and the downmix selection decisions (block E451) are gathered.
  • FIG. 4d illustrates the steps implemented for the channel reduction processing of block 307.
  • This variant embodiment is intended to simplify the decision on the downmix method to be used, this time by not using passive downmix - ⁇ / c]. Indeed, this passive downmix is in fact already included in the hybrid downmix M 2 [k]; moreover, we can consider that the hybrid downmix is a more robust variant than the downmix M [k] because it makes it possible to avoid the problems of phase opposition.
  • step E403 If in step E403, the indicator is below a threshold th2, then downmix processing M2 is implemented in step E410.
  • step E405 If in step E405, the indicator is below a threshold th3, then a downmix processing mode according to M2 and M3 is implemented in step E406.
  • step E405 the indicator is greater than threshold th3, then a downmix processing mode M3 is implemented in step E407.
  • FIG. 4d is strictly equivalent to that of FIG. 4b by setting th1 to a value ⁇ 0.
  • FIG. 4e illustrates the steps implemented for the block reduction processing of block 307.
  • the indicator characterizing the channels of the multichannel digital audio signal is the ISD phase indicator. representative of a phase opposition degree measurement of the multichannel signal channels.
  • step E420 For a stereo signal, this parameter is as defined in equation (18) for a spectral line calculation.
  • step E421 if in step E421, the indicator ISD [k] is greater than a threshold thO, then a first downmix processing mode is implemented in step E422.
  • step E421 If in step E421, the ISD indicator [k] is below thO threshold, then a second downmix processing mode is implemented in step E423.
  • the main criterion for selecting the downmix mode is defined as the ISD parameter as in FIG. 4e, however this parameter is this time defined by subband in step E430, ISD [b] where b is the index of the frequency subband (typically ERB or Bark).
  • ISD the index of the frequency subband
  • the selected downmix mode is this time. This is similar to the method defined in Annex D of G.722 but more directly, without the use of full-band IPD.
  • step E431 if in step E431, the ISD indicator [b] is greater than a threshold thO, then a first downmix processing mode is implemented in step E432.
  • step E431 If in step E431, the ISD flag [b] is below thO threshold, then a second downmix processing mode is implemented in step E433.
  • the multichannel signal channels such as the ICCr parameter or the ISD parameter (on the frame, by subband, or by line).
  • a cross fade could be applied in the embodiment where the criterion is the ISD indicator.
  • M [k] pl. - / c] + p2.
  • M 3 [k] could also be chosen.
  • the weights p1, p2 and p3 are then adapted according to the selection criteria.
  • FIG. 5 gives an example of the evolution of the parameter ICCr for a given signal with the decision thresholds th3 and th1 set at 0.4 and 0.6 as described in the embodiment of FIG. 4b. Note that these predetermined values are especially valid for a frame of 20 ms and they can be modified if the frame length is different.
  • This figure shows the fluctuation of this ICCr indicator and the SGN indicator. It is therefore wise to adapt the downmix treatment as best as possible to the evolution of this indicator. Indeed, a significant correlation of the signals for the frames from 100 to 300 for example, can allow an adaptive downmix with alignment on a reference of phase.
  • the ICCr indicator is between the thresholds th1 and th3, this means that the signal channels are moderately correlated and potentially out of phase.
  • the downmix to be applied depends on an indicator indicating a phase opposition between the channels. If the indicator reveals a phase opposition, then it is preferable to select the downmix with alignment on an adaptive phase reference defined above by M 3 [k]. In the opposite case, the passive downmix with gain compensation defined above by M- ⁇ [k] is sufficient.
  • the value of the parameter SGN which is also represented in FIG. 5 serves to choose the right phase reference in the case where the correlation indicator is under a threshold, for example 0.4.
  • the phase reference therefore passes from L to R around the frame 500.
  • the spectra L bU f [k] and R bU f [k] are divided into 35 sub-frequency bands. These subbands are defined by the following boundaries:
  • ICLD [b] 10. log w ⁇ (21) where a [b] and ⁇ [b] represent the energy of the left channel ⁇ L bU f [k]) and the right channel (R buf [k]) :
  • the ICLD parameters are coded by differential non-uniform scalar quantization (block 315). This quantification will not be detailed here because it goes beyond the scope of the invention.
  • ICPD and ICC parameters are encoded by methods known to those skilled in the art, for example with uniform scalar quantization over the appropriate interval.
  • This decoder comprises a demultiplexer 501 in which the coded mono signal is extracted to be decoded at 502 by a mono EVS decoder in this example.
  • the part of the bitstream corresponding to the EVS mono encoder is decoded according to the bit rate used at the encoder. It is assumed here that there is no loss of frames or bit errors on the bit stream to simplify the description, however, known frame loss correction techniques can obviously be implemented in the decoder.
  • the decoded mono signal corresponds to (n) in the absence of channel errors.
  • a short-term discrete Fourier transform analysis with the same windowing as the encoder is performed on M (n) (blocks 503 and 504) to obtain the spectrum M [k]. It is considered here that a decorrelation in the frequency domain (block 520) is also applied.
  • the part of the bit stream associated with the stereo extension is also de-multiplexed.
  • the ICLD, ICPD, ICC parameters are decoded to obtain ICLD q [b], ICPD q [b] and ICC q [b] (blocks 505 to 507).
  • the decoded mono signal may be decorrelated for example in the frequency domain (block 520).
  • the implementation details of block 508 are not presented here because they go beyond the scope of the invention, but conventional techniques known to those skilled in the art can be used.
  • the spectra L [k] and R [k] are thus calculated and then converted in the time domain by inverse FFT, windowing, addition and overlap (blocks 509 to 514) to obtain the synthesized channels L (n) and R (n). .
  • the encoder presented with reference to FIG. 3 and the decoder presented with reference to FIG. 7 have been described in the case of a particular application of stereo coding and decoding.
  • the invention has been described from a decomposition of stereo channels by discrete Fourier transform.
  • the invention is also applicable to other complex representations, such as for example the Modulated Complex Lapped Transform (MCLT) decomposition.
  • MCLT Modulated Complex Lapped Transform
  • MDCT discrete modified cosine transform
  • MDST discrete modified sinus transform
  • PQMF Pseudo-Quadrature Mirror Filter
  • the downmix that is the subject of the invention may be used not only for coding but also for decoding in order to generate a mono signal at the output of a decoder or stereo receiver, in order to ensure compatibility with equipment only. mono. This can be the case for example when going from a sound reproduction to the headphones to a return to a speaker.
  • FIG. 8 illustrates this embodiment, for example a stereo signal is decoded (L (n), R (n)). It is transformed by the respective blocks 601, 602 and 603, 604 to obtain the left and right spectrums (L [k] and R [k]).
  • This processing block 605 comprises a obtaining module 605a of at least one indicator characterizing the channels of the multichannel stereo signal received, here the stereo signal.
  • the indicator may for example be an interchannel correlation type indicator or an indicator of degree of phase opposition between channels.
  • the selection block 605b from among a set of downmix processing modes, selects a downmix processing mode that applies at 605c to the input signals, here to the stereo signal L [ k], R [k] to give a mono signal [/ c].
  • the encoders and decoders as described with reference to FIGS. 3, 7 and 8 may be integrated in multimedia equipment of the set-top box type or audio or video content player. They can also be integrated into communication equipment of the mobile phone or communication gateway type.
  • the case of a downmix of 5.1 channels to a stereo signal is considered.
  • a 5.1 type surround signal defined as a set of 6 channels: L (Front Left), C (Center), R (Front Right), Ls (Left Surround or Rear Left), Rs (Right Surround or Rear Right), LFE (Low Frequency Effects or Subwoofer).
  • two downmix variants of 5.1 stereo can be applied according to the invention:
  • the C and LFE channels can be combined by passive downmix and the result can be combined separately to the L and R channels by applying the embodiments of downmix from 2 channels (stereo) to 1 channel (mono) to obtain channels L 'and R' respectively. Then, the channels L 'and R' can also be combined with respectively Ls and Rs by applying the downmix embodiments of 2 channels (stereo) to 1 channel (mono) to obtain respectively L "and R" channels which are the result of the downmix.
  • This implementation therefore calls "hierarchically" (in successive steps) to an elementary downmix type 2 to 1 previously described according to different variants.
  • the invention can be generalized to simultaneously combine 3 channels on one side L, Ls, C + LFE and on the other side R, Rs, C + LFE where
  • C + LFE is the result of a simple passive downmix to directly obtain two L "and R" channels.
  • downmix we can define several downmix as in the case of stereo: a passive downmix M [k] of 3 signals with gain compensation, a downmix M 3 [k] of 3 signals with adaptive alignment of the phase on an adaptive reference (the dominant signal among the 3).
  • the downmix is obtained according to the generalization:
  • M [k] piyCCrlZ ⁇ CCrlS CCrZS ⁇ . M ⁇ k]
  • weights p1 and p3 are multivariate functions, for example the ICCrij correlation between each pair of channels i and j (for example, L, Ls, C + LFE) taken in pairs.
  • the number of input and output channels of the downmix may be different from the stereo to mono or 5.1 to stereo cases illustrated here.
  • FIG. 9 represents an exemplary embodiment of such an equipment in which an encoder as described with reference to FIG. 3 or a processing device as described with reference to FIG. 8, according to the invention is integrated.
  • This device comprises a PROC processor cooperating with a memory block BM having a storage and / or working memory MEM.
  • the memory block can advantageously comprise a computer program comprising code instructions for implementing the steps of the coding method in the sense of the invention, or the processing method when these instructions are executed by the processor PROC, and especially the steps of extracting at least one indicator characterizing the channels of the multichannel digital audio signal and selecting, from a set of channel reduction processing modes, a mode of processing of channel reduction according to the value of the at least one indicator characterizing the multichannel audio signal channels.
  • These instructions are executed for channel reduction processing when encoding a multichannel signal or processing a decoded multichannel signal.
  • the program may include the steps implemented to code the information adapted to this treatment.
  • the memory MEM can store the various downmix processing modes to be selected according to the method of the invention.
  • FIGS. 3, 4a to 4f show the steps of an algorithm of such a computer program.
  • the computer program can also be stored on a memory medium readable by a reader of the device or equipment or downloadable in the memory space thereof.
  • Such equipment or encoder comprises an input module capable of receiving a multichannel signal, for example a stereo signal comprising the R and L channels for right and left, either by a communication network, or by reading a content stored on a terminal. storage medium.
  • This multimedia equipment may also include means for capturing such a stereo signal.
  • the device comprises an output module adapted to transmit a mono signal M from the selected channel reduction processing according to the invention and in the case of a coding device, the coded spatial information parameters P c .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)
EP16825835.8A 2015-12-16 2016-12-13 Adaptive kanalreduktionsverarbeitung zur codierung eines mehrkanalaudiosignals Pending EP3391370A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1562485A FR3045915A1 (fr) 2015-12-16 2015-12-16 Traitement de reduction de canaux adaptatif pour le codage d'un signal audio multicanal
PCT/FR2016/053353 WO2017103418A1 (fr) 2015-12-16 2016-12-13 Traitement de réduction de canaux adaptatif pour le codage d'un signal audio multicanal

Publications (1)

Publication Number Publication Date
EP3391370A1 true EP3391370A1 (de) 2018-10-24

Family

ID=55646738

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16825835.8A Pending EP3391370A1 (de) 2015-12-16 2016-12-13 Adaptive kanalreduktionsverarbeitung zur codierung eines mehrkanalaudiosignals

Country Status (5)

Country Link
US (1) US10553223B2 (de)
EP (1) EP3391370A1 (de)
CN (1) CN108369810B (de)
FR (1) FR3045915A1 (de)
WO (1) WO2017103418A1 (de)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269577B (zh) 2016-12-30 2019-10-22 华为技术有限公司 立体声编码方法及立体声编码器
CN109427337B (zh) * 2017-08-23 2021-03-30 华为技术有限公司 立体声信号编码时重建信号的方法和装置
GB201718341D0 (en) 2017-11-06 2017-12-20 Nokia Technologies Oy Determination of targeted spatial audio parameters and associated spatial audio playback
GB2572650A (en) 2018-04-06 2019-10-09 Nokia Technologies Oy Spatial audio parameters and associated spatial audio playback
GB2574239A (en) * 2018-05-31 2019-12-04 Nokia Technologies Oy Signalling of spatial audio parameters
WO2020094263A1 (en) * 2018-11-05 2020-05-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and audio signal processor, for providing a processed audio signal representation, audio decoder, audio encoder, methods and computer programs
CN111332197B (zh) * 2020-03-09 2021-08-03 湖北亿咖通科技有限公司 一种车载娱乐系统的灯光控制方法、装置及车载娱乐系统
JP7396459B2 (ja) * 2020-03-09 2023-12-12 日本電信電話株式会社 音信号ダウンミックス方法、音信号符号化方法、音信号ダウンミックス装置、音信号符号化装置、プログラム及び記録媒体

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004042819A1 (de) * 2004-09-03 2006-03-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Erzeugen eines codierten Multikanalsignals und Vorrichtung und Verfahren zum Decodieren eines codierten Multikanalsignals
US9082395B2 (en) * 2009-03-17 2015-07-14 Dolby International Ab Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
KR101756838B1 (ko) * 2010-10-13 2017-07-11 삼성전자주식회사 다채널 오디오 신호를 다운 믹스하는 방법 및 장치
FR2966634A1 (fr) * 2010-10-22 2012-04-27 France Telecom Codage/decodage parametrique stereo ameliore pour les canaux en opposition de phase
CN102446507B (zh) * 2011-09-27 2013-04-17 华为技术有限公司 一种下混信号生成、还原的方法和装置
EP2830053A1 (de) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Mehrkanaliger Audiodecodierer, mehrkanaliger Audiocodierer, Verfahren und Computerprogramm mit restsignalbasierter Anpassung einer Beteiligung eines dekorrelierten Signals

Also Published As

Publication number Publication date
FR3045915A1 (fr) 2017-06-23
CN108369810A (zh) 2018-08-03
US10553223B2 (en) 2020-02-04
WO2017103418A1 (fr) 2017-06-22
US20190156841A1 (en) 2019-05-23
CN108369810B (zh) 2024-04-02

Similar Documents

Publication Publication Date Title
EP3391370A1 (de) Adaptive kanalreduktionsverarbeitung zur codierung eines mehrkanalaudiosignals
EP2374123B1 (de) Verbesserte codierung von mehrkanaligen digitalen audiosignalen
EP3427260B1 (de) Optimierte codierung und decodierung von verräumlichungsinformationen zur parametrischen codierung und decodierung eines mehrkanaligen audiosignals
EP2656342A1 (de) Verbesserte stereoparametrische kodierung/dekodierung für gegenphasige kanäle
EP2489039B1 (de) Optimierte parametrische codierung/decodierung mit niedrigem durchsatz
EP2002424B1 (de) Vorrichtung und verfahren zur skalierbaren kodierung eines mehrkanaligen audiosignals auf der basis einer hauptkomponentenanalyse
EP2691952B1 (de) Zuweisung von bits anhand von subbändern zur quantifizierung von rauminformationsparametern für parametrische codierung
EP2005420B1 (de) Einrichtung und verfahren zur codierung durch hauptkomponentenanalyse eines mehrkanaligen audiosignals
WO2010076460A1 (fr) Codage perfectionne de signaux audionumériques multicanaux
EP2042001B1 (de) Binaurale spatialisierung kompressionsverschlüsselter tondaten
WO2007093726A2 (fr) Dispositif de ponderation perceptuelle en codage/decodage audio
EP2304721A1 (de) Raumsynthese mehrkanaliger tonsignale
FR3049084A1 (de)
WO2011073600A1 (fr) Codage/decodage parametrique stereo avec optimisation du traitement de reduction des canaux
EP2126905B1 (de) Verfahren und Vorrichtung zur Kodierung und Dekodierung von Audiosignalen, kodiertes Audiosignal
EP2489040A1 (de) Optimierte parametrische stereodecodierung
EP4042418B1 (de) Bestimmung von korrekturen zur anwendung auf ein mehrkanalaudiosignal, zugehörige codierung und decodierung
WO2023232823A1 (fr) Titre: codage audio spatialisé avec adaptation d'un traitement de décorrélation
FR2857552A1 (fr) Procede de decodage d'un signal permettant de reconstituer une scene sonore a transformation temps-frequence faible complexite, et dispositif correspondant
FR2853804A1 (fr) Procede de decodage d'un signal permettant de reconstituer une scene sonore et dispositif de decodage correspondant
FR2980620A1 (fr) Traitement d'amelioration de la qualite des signaux audiofrequences decodes

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20180709

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ORANGE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20201215

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

RAP3 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ORANGE