WO2002063925A2 - Audio channel translation - Google Patents

Audio channel translation Download PDF

Info

Publication number
WO2002063925A2
WO2002063925A2 PCT/US2002/003619 US0203619W WO02063925A2 WO 2002063925 A2 WO2002063925 A2 WO 2002063925A2 US 0203619 W US0203619 W US 0203619W WO 02063925 A2 WO02063925 A2 WO 02063925A2
Authority
WO
WIPO (PCT)
Prior art keywords
channels
channel
output
signal
input channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2002/003619
Other languages
English (en)
French (fr)
Other versions
WO2002063925A3 (en
WO2002063925A8 (en
Inventor
Mark Franklin Davis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to MXPA03007064A priority Critical patent/MXPA03007064A/es
Priority to DE60225806T priority patent/DE60225806T2/de
Priority to CA2437764A priority patent/CA2437764C/en
Priority to JP2002563741A priority patent/JP2004526355A/ja
Priority to US10/467,213 priority patent/US20040062401A1/en
Priority to AU2002251896A priority patent/AU2002251896B2/en
Priority to HK04109904.2A priority patent/HK1066966B/xx
Priority to EP02720929A priority patent/EP1410686B1/en
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Priority to KR1020037010231A priority patent/KR100904985B1/ko
Publication of WO2002063925A2 publication Critical patent/WO2002063925A2/en
Priority to US10/522,515 priority patent/US7660424B2/en
Anticipated expiration legal-status Critical
Publication of WO2002063925A3 publication Critical patent/WO2002063925A3/en
Publication of WO2002063925A8 publication Critical patent/WO2002063925A8/en
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround

Definitions

  • the invention relates to audio signal processing. More particularly the invention relates to translating M audio input channels representing a soundfield to N audio output channels representing the same soundfield, wherein each channel is a single audio stream representing audio arriving from a direction, M and N are positive whole integers, and M is at least 2.
  • Representative systems include the panned-mono three-speaker film soundtracks of the early 50's, conventional stereo sound, quadraphonic systems of the 60's, five channel discrete magnetic soundtracks on 70mm films, Dolby surround using a matrix in the 70's, AC-3 5.1 channel sound of the 90's, and recently, Surround-EX 6.1 channel sound.
  • "Dolby”, “Pro Logic” and “Surround EX” are trademarks of Dolby Laboratories Licensing Corporation. To one degree or another, these systems provide enhanced spatial reproduction compared to monophonic presentation.
  • a first step toward practicality can be taken by noting the signal of interest is bandlimited, at about 20 kHz, permitting the application of the Spatial Sampling theorem, a variant of the more common Temporal Sampling theorem.
  • the latter holds that there is no loss of information if a continuous bandlimited temporal waveform is discretely sampled at a rate at least twice the highest frequency of the source.
  • the former theory follows from the same considerations to stipulate that the spatial sampling interval must by at least twice as dense as the shortest wavelength in order to avoid information loss. Since the wavelength of 20 kHz in air is about 3/8", the implication is that an accurate 3D sound system can be implemented with an array of microphones and loudspeakers spaced no more than 3/16" apart.
  • any output channel with a location that does not correspond to the position of one of the cardinal channels will be referred to as an "intermediate” channel.
  • An output channel may also have a location coincident with the position of a cardinal input channel. It is therefore desirable to reduce the number of discrete channel spatial samples, or cardinal channels.
  • One possible basis for doing so is the fact that, above 1500 Hz, the ear no longer follows individual cycles, only the critical band envelope. This might allow channel spacing commensurate with 1500 Hz, or about 3". This would reduce the total for the 9'xl2' room to about 6000 channels, a useful saving of about 2.49 million channels compared to the previous arrangement.
  • multiple sets of output channels are associated with more than two input channels and the process determines the correlation of input channels with which each set of output channels is associated according to a hierarchical order such that each set or sets is ranked according to the number of input channels with which its output channel or channels are associated, the greatest number of input channels having the highest ranking, and the processing processes sets in order according to their hierarchical order. Further according to an aspect of the present invention, the processing takes into account the results of processing higher order sets.
  • each of the M audio input channels representing audio arriving from a direction was generated by a passive-matrix nearest-neighbor amplitude-panned encoding of each source direction (i.e., a source direction is assumed to map primarily to the nearest cardinal channel or channels), without the requirement of additional side chain information (the use of side chain or auxiliary information is optional), making it compatible with existing mixing techniques, consoles, and formats.
  • source signals may be generated by explicitly employing a passive encoding matrix, most conventional recording techniques inherently generate such source signals (thus, constituting an "effective encoding matrix").
  • the playback or decoding aspects of the present invention are also largely compatible with natural recording source signals, such as might be made with five real directional microphones, since, allowing for some possible time delay, sounds arriving from intermediate directions tend to map principally to the nearest microphones (in a horizontal array, specifically to the nearest pair of microphones).
  • a decoder or decoding process may be implemented as a lattice of coupled processing modules or modular functions (hereinafter, "decoding modules"), each of which is used to generate one or more output channels (or, alternatively, control signals usable to generate one or more output channels) from the two or more of the closest spatially adjacent cardinal channels associated with the decoding module.
  • the output channels represent relative proportions of the audio signals in the closest spatially adjacent cardinal channels associated with the particular decoding module.
  • the decoding modules are loosely coupled to each other in the sense that modules share nodes and there is a hierarchy of decoding modules.
  • Modules are ordered in the hierarchy according to the number of cardinal channels they are associated with (the module or modules with the highest number of associated cardinal channels is ranked highest).
  • a supervisory routine function p resides over the modules so that common node signals are equitably shared and higher-order decoder modules may affect the output of lower-order modules.
  • Each decoder module may, in effect, include a matrix such that it directly generates output signals or each decoder module may generate control signals that are used, along with the control signals generated by other decoder modules, to vary the coefficients of a variable matrix or the scale factors of inputs to or outputs from a fixed matrix in order to generate all of the output signals.
  • Decoder modules emulate the operation of the human ear to attempt to provide perceptually transparent reproduction.
  • Each decoder module may be implemented as either a wideband or multiband structure or function, in the latter case with either a continuous filterbank, or a block-structure, for example, a transform-based processor, using, for example, the same essential processing in each band.
  • the basic invention relates generally to the spatial translation of M input channels to N output channels, wherein M and N are positive whole integers and M is at least two
  • another aspect of this invention is that the quantity of speakers receiving the N output channels can be reduced to a practical number by judicious reliance upon virtual imaging, that is the creation of perceived sonic images at positions in space other than where a loudspeaker is located.
  • virtual imaging is the most common use of virtual imaging is in the stereo reproduction of an image part way between two speakers, by panning a mono signal between the channels. Virtual imaging is not considered a viable technique for group presentation with a sparse number of channels, because it requires the listener to be equidistant from the two speakers, or nearly so.
  • the left and right front speakers are too far apart to obtain useful phantom imaging of a center image to much of the audience, so, given the importance of the center channel as the source of much of the dialog, a physical center speaker is used instead.
  • FIG. 1 is a top plan view showing schematically an idealized decoding arrangement in the manner of the just-described test arrangement.
  • Five wide range horizontal cardinal channels are shown as squares 1', 3', 5', 9' and 13' on the outer circle.
  • the twenty-three wide range output channels are shown as numbered filled circles 1-23.
  • the outer circle of sixteen output channels is on a horizontal plane, the inner circle of six output channels is forty- five degrees above the horizontal plane.
  • Output channel 23 is directly above one or more listeners.
  • Each module is associated with a respective pair or trio of closest spatially adjacent cardinal channels.
  • the decoding modules represented in FIG. 1 have three, four or five output channels, a decoding module may have any reasonable number of output channels.
  • An output channels may be located intermediate to one or more cardinal channels or at the same position as a cardinal channel.
  • each of the cardinal channel locations is also an output channel.
  • Two or three decoding modules share each input channel.
  • a design goal of this invention is that the playback processor should be capable in concept of working with an arbitrary number and arrangement of speakers, so the 24-channel array will be used as an illustrative but non-unique example of the density and arrangement required to achieve a convincing continuum perceived soundfield according to one aspect of the invention.
  • the desire to be able to use a large, and possibly user-selectable, number of presentation channels raises the question of the number of discrete channels, and/or other information, that must be conveyed to the playback processor in order for it to derive, at least as one option, the twenty four channels described above.
  • one possible approach is simply to transmit twenty four discrete channels, but aside from the fact that it would likely be onerous for content producers to have to mix that many separate channels, and for a transmission medium to convey as many channels, it is preferred not to do so, as the 24-channel arrangement is merely one of many possible, and it is desired to allow for more or fewer presentation channels from a common transmitted signal array.
  • One way to recover output channels is to use formal spatial interpolation, a fixed weighted sum of the transmitted channels for each output, assuming the density of such channels is sufficiently great to allow for that.
  • this would require from thousands to millions of transmitted channels, analogous to the use of a multi- hundred-tap FIR filter to perform temporal interpolation of a single signal.
  • Reduction to a practical number of transmitted channels requires the application of psychoacoustic principles and more aggressive, dynamic interpolation from far fewer channels, still leaving unanswered the question of just how many channels are needed to impart the percept of a complete soundfield.
  • a channel translation decoder it should be possible for a channel translation decoder to accept a standard 5.1 channel program and convincingly present it through an arbitrary number of horizontally arrayed speakers, including the sixteen horizontal speakers of the twenty-four-channel array described earlier.
  • a vertical channel such as is sometimes proposed for a digital cinema system, it should be possible to feed the entire twenty-four-channel array with individually derived, perceptually valid signals that together impart a continuum soundfield percept at most listening positions.
  • the channel translation decoder is not limited to operation with 5.1 channel sources, and may use fewer or more, but there is at least some justification to the belief that credible performance can be obtained from 5.1 channel sources.
  • the channel translation decoder consist of a series of modular interpolating signal processors, each in effect emulating an optimally placed listener, and each functioning in a manner analogous to the human auditory system to extract what would otherwise be virtual images from amplitude-panned signals, and feed them to real loudspeakers; the speakers preferably arrayed densely enough that natural virtual imaging can fill in the remaining gaps between them.
  • each decoding module derives its inputs from the nearest transmitted cardinal channels, which, for example, for a canopy (overhead) array of speakers may be three or more cardinal channels.
  • One way of generating output channels involving more than two cardinal channels might be to employ a series of pair-wise operations, with, e.g., outputs of some pair-wise decoding modules feeding the inputs of other modules.
  • this has two drawbacks.
  • cascading decoding modules introduces multiple cascaded time constants, resulting in some output channels responding more quickly than others, causing audible position artifacts.
  • the second drawback is that pair-wise correlation alone can only place intermediate or derived output channels along the line between the pair; use of three or more cardinals removes this restriction. Consequently, an extension to common pair-wise correlation has been developed to correlate three or more output signals; this technique is described below.
  • Horizontal localization in the human ear is predicated primarily upon two localization cues: interaural amplitude differences and interaural time differences.
  • the latter cue is only valid for signal pairs in near time alignment, ⁇ 600 microseconds or so.
  • the practical effect is that phantom intermediate images will only occur at positions corresponding to a particular left/right amplitude difference, assuming the common signal content in the two real channels is correlated, or nearly so. (Note: two signals can have cross correlation values that span from +1 to -1.
  • Vertical localization is a little more complex, relying on HRTF pinna cues and dynamic modulation of the horizontal cues with head motion, but the final effect is similar to horizontal localization with respect to panned amplitudes, cross correlation, and corresponding perceived image position and fusion.
  • Vertical spatial resolution is, however, less precise than horizontal resolution, and does not require as dense an array of cardinal channels for adequate interpolation performance.
  • An advantage of using directional processors that emulate the operation of the human ear is that any imperfections or limitations of the signal processing should be perceptually masked by like imperfections and limitations of the human ear, allowing for the possibility that the system will be perceived as nearly indistinguishable from the original full continuum presentation.
  • the present invention is designed to make effective use of however many or few output channels are available (including playback via as many loudspeakers as there are input channels with no decoding, and passive mixdown to fewer channels, including mono, stereo and surround compatible Lt/Rt), it is preferably intended to employ a large and somewhat arbitrary, but nonetheless practical number of presentation channels/loudspeakers, and use as source material a similar or smaller number of encoded channels, including existing 5.1 channel surround tracks, and possible next- generation 11- or 12-channel digital cinema soundtracks.
  • Implementations of the present invention desirably should exhibit four principles: error containment, dominant containment, constant power, and synchronized smoothing.
  • Error containment refers to the notion that, given the likelihood of decoding errors, the decoded position of each source should be in some reasonable sense near its true, intended direction. This mandates a certain degree of conservatism in decoding strategy. Faced with the prospect of more aggressive decoding accompanied by possibly greater spatial disparity in the event of errors, it is usually preferable to accept less precise decoding in exchange for assured spatial containment. Even in situations in which more precise decoding can confidently be applied, it may be unwise to do so if there is a likelihood that dynamic signal conditions will require the decoder to ratchet between aggressive and conservative modes, resulting in audible artifacts.
  • Dominant containment a more constrained variant of error containment, is the requirement that a single well-defined dominant signal should be panned by the decoder to only nearest neighbor output channels. This condition is necessary to maintain image fusion for dominant signals, and contributes to the perceived discreteness of a matrix decoder. While a signal is dominant, it is suppressed from other output channels, either by subtracting it from the associated cardinal signals, or by directly applying to other output channels matrix coefficients complementary to those used to derive the dominant signal ("anti-dominant coefficients/signal").
  • Synchronized smoothing applies to systems with signal dependent smoothing time constants, and requires that if any smoothing network within a decoding module is switched to a fast time constant mode, all other smoothing networks within the module be similarly switched. This is to avoid having a newly dominant directional signal appear to slowly fade/pan from the previous dominant direction.
  • FIG. 1 is a schematic drawing showing a top plan view of an idealized decoder arrangement.
  • channel translation decoding is based on a series of semi- autonomous decoding modules which in a general sense recover output channels, particularly intermediate output channels, each usually from a subset of all the transmitted channels, in a fashion similar to the human ear.
  • the operation of the decoding module is based on a combination of amplitude ratios, to determine the nominal ongoing primary direction, and cross correlation, to determine the relative width of the image.
  • the processor uses control information derived from the amplitude ratios and cross correlation, the processor then extracts output channel audio signals. Since this is best done on a linear basis, to avoid generation of distortion products, the decoder forms weighted sums of cardinal channels containing the signal of interest. (As explained below, it may also be desirable to include information about non-neighbor cardinals in the calculation of the weighted sum.) This limited but dynamic form of interpolation is more commonly referred to as matrixing. If, in the source, the desired signal is mapped (amplitude panned) to the nearest M cardinal channels, then the problem is one of M:N matrix decoding. In other words, the output channels represent relative proportions of the input channels.
  • an uncorrelated Left-In and Right-In signal pair with no Front- In or Rear- In will map to the same net, uncorrelated Lt/Rt pair as will an uncorrelated Front-In/Back-In pair, with no Left-In/Right-In, or for that matter from all four inputs uncorrelated.
  • the decoder faced with an uncorrelated Lt/Rt signal pair, has no choice but to "relax the matrix", that is use a passive matrix that distributes sound to all output channels. It is incapable of decoding to a simultaneous Left-Out/Right-Out only, or Front-Out/Rear-Out only signal array.
  • the underlying problem is that the use of interchannel phase to code front/back position in N:2:N matrixing systems runs counter to the operation of the human ear, which does not use phase to judge front/back position.
  • the present invention works best with at least three non-collinear cardinal channels, so that front/back position is indicated by the assumed directions of the cardinal channels, without assigning different directions depending on their relative phases or polarities.
  • a pair of uncorrelated or anti-correlated channel translation cardinal signals unambiguously decodes to isolated cardinal-output channel signals, with no intermediate signal and no "rearward" direction indicated.
  • each decoding module especially those with two input channels, resembles a prior art active 2:N decoder, with the front/back detection disabled or modified, and an arbitrary number of output channels.
  • N the number of input channels
  • the decoding module may at times exhibit less than perfect channel recovery in the presence of multiple active source direction signals.
  • the human auditory system limited to using just two ears, will tend to be subject to the same limitations, allowing the system to be perceived as discrete, even with all channels operating. Isolated channel quality, with other channels muted, is still a consideration to accommodate listeners that may be situated near one speaker.
  • the decoding modules themselves could be independent, autonomous entities. Such is, however, not usually the case.
  • a given transmitted channel will in general share separate output signals with two or more neighboring cardinal channels.
  • independent decoding modules are used to decode the array, each will be influenced by output signals of neighboring channels, resulting in possibly serious decoding errors.
  • two output signals of neighboring decoding modules will "pull", or gravitate, toward each other, because of the increased level of the common cardinal node containing both signals. If, as is likely to be the case, the signals are dynamic, so too will be the amount of interaction, leading to signal dependent dynamic positioning errors of a possibly highly objectionable nature.
  • cardinal channel pair A/B contains a common signal X along with individual, uncorrelated signals Y and Z:
  • the total energy in cardinal channel A is the sum of the energies of signals X and Y.
  • the averaged cross-product of the signals is equal to the energy of the common signal component in each channel. If the common signal is not shared equally, i.e., it is panned toward one of the cardinals, the averaged cross-product will be the geometric mean between the energy of the common components in A and B, from which individual channel common energy estimates can be derived by normalizing by the square root of the ratio of the channel amplitudes.
  • Actual time averages are computed with a leaky integrator having a suitable decay time constant, to reflect ongoing activity.
  • the time constant smoothing can be elaborated with nonlinear attack and decay time options, and in a multiband system, may be scaled with frequency.
  • the sign of the each product is discarded by taking the absolute value of the product.
  • the amount of common output channel signal energy can be estimated.
  • the above example involved a single interpolation processor, but if one or more of the A/B(/C) nodes were common to another module with its own common signal component, uncorrelated with any other signals, then the averaged cross-product computed above would not be affected, making the calculation inherently free of any image pulling effects. (Note: if the two output signals are not uncorrelated, they will tend to pull the decoders some, but should have a similar effect on the human ear, so again system operation should remain faithful to human audition.)
  • the supervisor routine function can inform neighboring modules of each others' common energy, at which point the extraction of the output channel signals can proceed as described below.
  • the calculation of the common energy used by a module at a node must take into account the hierarchy of possibly overlapping modules of different order, and subtract the common energy of a higher order module from the estimated common energy of any lower order module sharing the same nodes.
  • the process of recovering the ensemble of output channels from the transmitted channels in a linear fashion is basically one of matrixing, that is forming weighted sums of the cardinal channels to derive output channel signals.
  • the optimal choice of matrix scale factors is generally signal dependent. Indeed, if the number of currently active output channels is equal to the number of transmitted channels (but representing different directions), making the system exactly constrained, it is mathematically possible to compute an exact inverse of the effective encoding matrix, and recover isolated versions of the source signals. Even if the number of active output channels is greater than the number of cardinals, it may still be possible to compute a matrix pseudo-inverse.
  • ratios are typically calculated as the quotient of one channel's matrix coefficient over the RMS sum of all of that input channels' matrix coefficients (usually 1). For example, in a two-input module with inputs L and R, the energy ratio used would be the L energy over the sum of the L and R energies ("L- ratio”), which has a well-behaved range of 0 to 1.
  • the two-input decoding module has five output channels with effective encoding matrix coefficient pairs of (1.0, 0), (0.89, 0.45), (0.71, 0.71), (0.45, 0.89) and (0, 1.0), the corresponding L-ratios are 1.0, 0.89, 0.71, 0.45, and 0, since each scale factor pair has an RMS sum of 1.0.
  • the dominant direction indicator is calculated as the vector sum of the cardinal directions, weighted by the relative energy. For a two input module, this simplifies to being the L-ratio of the normalized input signal power levels.
  • the output channels bracketing the dominant direction are determined by comparing the dominant direction L-ratio of step two, to the L-ratios of the output channels. For example, if the L-ratio of the above five-output-decoding-module inputs is 0.75, the second and third output channels bracket dominant signal direction, since 0.89 > 0.75 > 0.71. Panning scale factors to map the dominant signal onto the nearest bracketing channels are calculated from the ratio of the anti-dominant signal levels of the channels.
  • the anti-dominant signal associated with a particular output channel is the signal that results when the corresponding decoding module's input signals are matrixed with the output channel's anti-dominant matrix scale factors.
  • a single dominant signal is panned to an output channel with encode scale factors (A, B)
  • the anti-dominant signal is calculated and panned with suitable gain scaling to all non-dominant channels.
  • the anti-dominant signal is a matrixed signal lacking any of the dominant signal. If the inputs to a decoding module are (x(t), y(t)) with normalized amplitudes (X, Y), the dominant signal is Xx(t)+Yy(t) and the anti- dominant signal is Yx(t)-Xy(t), irrespective of the positions of the non-dominant output channels.
  • a second signal distribution is calculated, using the "passive" matrix, which is basically the output channel matrix scale factors already discussed, scaled to preserve power.
  • the cross correlation of the decoding module input signals is calculated as the averaged cross-product of the input signals divided by the square root of the product of the normalized input levels.
  • the final output signals are then calculated as a weighted crossfade sum of the dominant and passive signal distributions, using the decoding module's input signal cross-correlation to derive the crossfade factor.
  • the dominant/anti-dominant distribution is used exclusively.
  • the output signal array is broadened by cross-fading to the passive distribution, reaching completion at a low positive value of correlation, typically 0.2 to 0.4, depending on the number of output channels connected to the decoding module.
  • the passive amplitude output distribution is progressively bowed outward, reducing the output channel levels, emulating the response of the human ear to such signals.
  • That notion may have practical significance in the application of channel translation to existing 5.1 channel surround material, which, of course, lacks any vertical channel. However, it may contain vertical information, such as fly-overs, which are panned across many or all of the horizontal channels.
  • it should be possible to extract a virtual vertical channel from such source material by looking for correlations among non-neighboring channels or groups of channels. Where such correlations exist, they will usually indicate the presence of vertical information from above, rather than below the listener.
  • the human ear exhibits a certain degree of positional memory, or inertia, in that a briefly dominant sound from a given direction that is clearly localized will result in other, less distinctly localizable sounds from that general direction to be perceived as coming from the same source.
  • each decoding module is based on the coincident cross correlation of its input signals. This may underestimate the amount of output signal content under some conditions. This will occur, for example, with a naturally recorded signal in which non-centered directions have slightly different arrival times, along with unequal amplitudes, resulting in a reduced correlation value. The effect may be exaggerated if wide-spaced microphones are used, with commensurately elongated interchannel delays. To compensate, the correlation calculation can be extended to cover a range of interchannel time delays, at the expense of slightly higher processing MIPS requirements. Also, since the neurons on the auditory nerve have an effective time constant of about 1 msec, more realistic correlation values may be obtained by first smoothing the rectified audio with a smoother having a 1 msec, time constant.
  • the evenness of the spread when processed with a channel translation decoder can be increased by slightly mixing adjacent channels, thereby increasing the correlation, which will cause the channel translation decoding module to provide a more even spread among its intermediate output channels.
  • Such mixing can be done selectively, for example leaving the center front channel signal unmixed, to preserve the compactness of the dialog track.
  • Loudness Compression/Expansion When the encoding process involves mixing a larger number of channels to a smaller number, there is a potential for clipping of the encoded signal if some form of gain compensation is not provided. This problem exists as well for conventional matrix encoding, but is potentially of greater significance for channel translation, because the number of channels being mixed to a given output channel is greater.
  • an overall gain scale factor is derived by the encoder and conveyed in the encoded bitstream to the decoder. Normally, this value is 0 dB, but can be set to a nonzero attenuating value by the encoder to avoid clipping, with the decoder providing an equivalent amount of compensating gain.
  • the decoder If the decoder is used to process an existing multichannel that lacks such a scale factor program (e.g., an existing 5.1 channel soundtrack), it could optionally use a fixed scale factor with an assumed value (presumably 0 dB), or apply an expansion function based on signal level and/or dynamics, or possibly make use of available metadata, such as a dialog normalization value, to adjust the decoder gain.
  • a scale factor program e.g., an existing 5.1 channel soundtrack
  • it could optionally use a fixed scale factor with an assumed value (presumably 0 dB), or apply an expansion function based on signal level and/or dynamics, or possibly make use of available metadata, such as a dialog normalization value, to adjust the decoder gain.
  • the present invention and its various aspects may be implemented in analog circuitry, or more probably as software functions performed in digital signal processors, programmed general-purpose digital computers, and/or special purpose digital computers. Interfaces between analog and digital signal streams may be performed in appropriate hardware and/or as functions in software and/or

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
  • Cosmetics (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Machine Translation (AREA)
PCT/US2002/003619 2001-02-07 2002-02-07 Audio channel translation Ceased WO2002063925A2 (en)

Priority Applications (10)

Application Number Priority Date Filing Date Title
HK04109904.2A HK1066966B (en) 2001-02-07 2002-02-07 Method for audio channel translation
CA2437764A CA2437764C (en) 2001-02-07 2002-02-07 Audio channel translation
JP2002563741A JP2004526355A (ja) 2001-02-07 2002-02-07 オーディオチャンネル変換方法
US10/467,213 US20040062401A1 (en) 2002-02-07 2002-02-07 Audio channel translation
AU2002251896A AU2002251896B2 (en) 2001-02-07 2002-02-07 Audio channel translation
EP02720929A EP1410686B1 (en) 2001-02-07 2002-02-07 Audio channel translation
KR1020037010231A KR100904985B1 (ko) 2001-02-07 2002-02-07 오디오 채널 변환
MXPA03007064A MXPA03007064A (es) 2001-02-07 2002-02-07 Conversion de canales de audio.
DE60225806T DE60225806T2 (de) 2001-02-07 2002-02-07 Audiokanalübersetzung
US10/522,515 US7660424B2 (en) 2001-02-07 2003-08-06 Audio channel spatial translation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US26728401P 2001-02-07 2001-02-07
US60/267,284 2001-02-07

Related Child Applications (3)

Application Number Title Priority Date Filing Date
US10467213 A-371-Of-International 2002-02-07
US10/522,515 Continuation-In-Part US7660424B2 (en) 2001-02-07 2003-08-06 Audio channel spatial translation
US10522515 Continuation-In-Part 2005-01-27

Publications (3)

Publication Number Publication Date
WO2002063925A2 true WO2002063925A2 (en) 2002-08-15
WO2002063925A3 WO2002063925A3 (en) 2004-02-19
WO2002063925A8 WO2002063925A8 (en) 2004-03-25

Family

ID=23018136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/003619 Ceased WO2002063925A2 (en) 2001-02-07 2002-02-07 Audio channel translation

Country Status (10)

Country Link
EP (1) EP1410686B1 (https=)
JP (1) JP2004526355A (https=)
KR (1) KR100904985B1 (https=)
CN (1) CN1275498C (https=)
AT (1) ATE390823T1 (https=)
AU (1) AU2002251896B2 (https=)
CA (1) CA2437764C (https=)
DE (1) DE60225806T2 (https=)
MX (1) MXPA03007064A (https=)
WO (1) WO2002063925A2 (https=)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006011367A1 (ja) * 2004-07-30 2006-02-02 Matsushita Electric Industrial Co., Ltd. オーディオ信号符号化装置および復号化装置
KR100763919B1 (ko) * 2006-08-03 2007-10-05 삼성전자주식회사 멀티채널 신호를 모노 또는 스테레오 신호로 압축한 입력신호를 2 채널의 바이노럴 신호로 복호화하는 방법 및 장치
KR100803344B1 (ko) * 2004-01-20 2008-02-13 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. 멀티채널 출력 신호를 구성하고 다운믹스 신호를 생성하기위한 장치 및 방법
JP2008509600A (ja) * 2004-08-03 2008-03-27 ドルビー・ラボラトリーズ・ライセンシング・コーポレーション 聴覚の情景分析を用いたオーディオ信号の結合
EP1914722A1 (en) 2004-03-01 2008-04-23 Dolby Laboratories Licensing Corporation Multichannel audio decoding
EP1538876A3 (en) * 2003-12-03 2009-04-29 Fondazione Scuola di San Giorgio Equipment for collection and measurement of quadraphonic sound data and metadata as well as a corresponding recording procedure
US7660424B2 (en) 2001-02-07 2010-02-09 Dolby Laboratories Licensing Corporation Audio channel spatial translation
RU2493617C2 (ru) * 2008-09-11 2013-09-20 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство, способ и компьютерная программа для обеспечения набора пространственных указателей на основе сигнала микрофона и устройство для обеспечения двухканального аудиосигнала и набора пространственных указателей
US9008338B2 (en) 2010-09-30 2015-04-14 Panasonic Intellectual Property Management Co., Ltd. Audio reproduction apparatus and audio reproduction method
US9183839B2 (en) 2008-09-11 2015-11-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues
US9197978B2 (en) 2009-03-31 2015-11-24 Panasonic Intellectual Property Management Co., Ltd. Sound reproduction apparatus and sound reproduction method
US10021500B2 (en) 2013-09-02 2018-07-10 Huawei Technologies Co., Ltd. Audio file playing method and apparatus
US10939219B2 (en) 2010-03-23 2021-03-02 Dolby Laboratories Licensing Corporation Methods, apparatus and systems for audio reproduction

Families Citing this family (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7551745B2 (en) * 2003-04-24 2009-06-23 Dolby Laboratories Licensing Corporation Volume and compression control in movie theaters
US11294618B2 (en) 2003-07-28 2022-04-05 Sonos, Inc. Media player system
US8290603B1 (en) 2004-06-05 2012-10-16 Sonos, Inc. User interfaces for controlling and manipulating groupings in a multi-zone media system
US11106425B2 (en) 2003-07-28 2021-08-31 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US8020023B2 (en) 2003-07-28 2011-09-13 Sonos, Inc. Systems and methods for synchronizing operations among a plurality of independently clocked digital data processing devices without a voltage controlled crystal oscillator
US11106424B2 (en) 2003-07-28 2021-08-31 Sonos, Inc. Synchronizing operations among a plurality of independently clocked digital data processing devices
US8234395B2 (en) 2003-07-28 2012-07-31 Sonos, Inc. System and method for synchronizing operations among a plurality of independently clocked digital data processing devices
US8086752B2 (en) 2006-11-22 2011-12-27 Sonos, Inc. Systems and methods for synchronizing operations among a plurality of independently clocked digital data processing devices that independently source digital data
US10613817B2 (en) 2003-07-28 2020-04-07 Sonos, Inc. Method and apparatus for displaying a list of tracks scheduled for playback by a synchrony group
US11650784B2 (en) 2003-07-28 2023-05-16 Sonos, Inc. Adjusting volume levels
US9977561B2 (en) 2004-04-01 2018-05-22 Sonos, Inc. Systems, methods, apparatus, and articles of manufacture to provide guest access
US8024055B1 (en) 2004-05-15 2011-09-20 Sonos, Inc. Method and system for controlling amplifiers
US8326951B1 (en) 2004-06-05 2012-12-04 Sonos, Inc. Establishing a secure wireless network with minimum human intervention
US8868698B2 (en) 2004-06-05 2014-10-21 Sonos, Inc. Establishing a secure wireless network with minimum human intervention
US7283634B2 (en) * 2004-08-31 2007-10-16 Dts, Inc. Method of mixing audio channels using correlated outputs
JP4917039B2 (ja) * 2004-10-28 2012-04-18 ディーティーエス ワシントン,エルエルシー 音響空間環境エンジン
JP4997781B2 (ja) * 2006-02-14 2012-08-08 沖電気工業株式会社 ミックスダウン方法およびミックスダウン装置
US8483853B1 (en) 2006-09-12 2013-07-09 Sonos, Inc. Controlling and manipulating groupings in a multi-zone media system
US9202509B2 (en) 2006-09-12 2015-12-01 Sonos, Inc. Controlling and grouping in a multi-zone media system
US12167216B2 (en) 2006-09-12 2024-12-10 Sonos, Inc. Playback device pairing
US8788080B1 (en) 2006-09-12 2014-07-22 Sonos, Inc. Multi-channel pairing in a media system
US8290782B2 (en) * 2008-07-24 2012-10-16 Dts, Inc. Compression of audio scale-factors by two-dimensional transformation
EP2398257B1 (en) 2008-12-18 2017-05-10 Dolby Laboratories Licensing Corporation Audio channel spatial translation
US11265652B2 (en) 2011-01-25 2022-03-01 Sonos, Inc. Playback device pairing
US11429343B2 (en) 2011-01-25 2022-08-30 Sonos, Inc. Stereo playback configuration and control
US8938312B2 (en) 2011-04-18 2015-01-20 Sonos, Inc. Smart line-in processing
US9042556B2 (en) 2011-07-19 2015-05-26 Sonos, Inc Shaping sound responsive to speaker orientation
US9344292B2 (en) 2011-12-30 2016-05-17 Sonos, Inc. Systems and methods for player setup room names
US9729115B2 (en) 2012-04-27 2017-08-08 Sonos, Inc. Intelligently increasing the sound level of player
US9008330B2 (en) 2012-09-28 2015-04-14 Sonos, Inc. Crossover frequency adjustments for audio speakers
KR102037418B1 (ko) 2012-12-04 2019-10-28 삼성전자주식회사 오디오 제공 장치 및 오디오 제공 방법
CN105075293B (zh) * 2013-03-29 2017-10-20 三星电子株式会社 音频设备及其音频提供方法
US9244516B2 (en) 2013-09-30 2016-01-26 Sonos, Inc. Media playback system using standby mode in a mesh network
US9226073B2 (en) 2014-02-06 2015-12-29 Sonos, Inc. Audio output balancing during synchronized playback
US9226087B2 (en) 2014-02-06 2015-12-29 Sonos, Inc. Audio output balancing during synchronized playback
US10248376B2 (en) 2015-06-11 2019-04-02 Sonos, Inc. Multiple groupings in a playback system
US10303422B1 (en) 2016-01-05 2019-05-28 Sonos, Inc. Multiple-device setup
KR20190055116A (ko) * 2016-10-04 2019-05-22 옴니오 사운드 리미티드 스테레오 전개 기술
US10712997B2 (en) 2016-10-17 2020-07-14 Sonos, Inc. Room association based on name
JP7213771B2 (ja) 2019-07-22 2023-01-27 株式会社ディーアンドエムホールディングス ワイヤレスオーディオシステム、ワイヤレススピーカ、およびワイヤレススピーカのグループ加入方法
KR20230074234A (ko) 2020-09-25 2023-05-26 소노스 인코포레이티드 재생 장치에 대한 지능적 셋업
JP2024048967A (ja) * 2022-09-28 2024-04-09 パナソニックIpマネジメント株式会社 音場再現装置、音場再現方法及び音場再現システム

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3070683D1 (en) * 1980-12-18 1985-06-27 Kroy Ind Inc Printing apparatus and tape-ribbon cartridge therefor
US6198827B1 (en) * 1995-12-26 2001-03-06 Rocktron Corporation 5-2-5 Matrix system
JPH10174199A (ja) 1996-12-11 1998-06-26 Fujitsu Ltd スピーカ音像制御装置
US6009179A (en) 1997-01-24 1999-12-28 Sony Corporation Method and apparatus for electronically embedding directional cues in two channels of sound
AUPP271598A0 (en) * 1998-03-31 1998-04-23 Lake Dsp Pty Limited Headtracked processing for headtracked playback of audio signals
US6757659B1 (en) * 1998-11-16 2004-06-29 Victor Company Of Japan, Ltd. Audio signal processing apparatus
EP1054575A3 (en) 1999-05-17 2002-09-18 Bose Corporation Directional decoding

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7660424B2 (en) 2001-02-07 2010-02-09 Dolby Laboratories Licensing Corporation Audio channel spatial translation
EP1538876A3 (en) * 2003-12-03 2009-04-29 Fondazione Scuola di San Giorgio Equipment for collection and measurement of quadraphonic sound data and metadata as well as a corresponding recording procedure
KR100803344B1 (ko) * 2004-01-20 2008-02-13 프라운호퍼-게젤샤프트 츄어 푀르더룽 데어 안게반텐 포르슝에.파우. 멀티채널 출력 신호를 구성하고 다운믹스 신호를 생성하기위한 장치 및 방법
US9672839B1 (en) 2004-03-01 2017-06-06 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US9779745B2 (en) 2004-03-01 2017-10-03 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US11308969B2 (en) 2004-03-01 2022-04-19 Dolby Laboratories Licensing Corporation Methods and apparatus for reconstructing audio signals with decorrelation and differentially coded parameters
US10796706B2 (en) 2004-03-01 2020-10-06 Dolby Laboratories Licensing Corporation Methods and apparatus for reconstructing audio signals with decorrelation and differentially coded parameters
US10460740B2 (en) 2004-03-01 2019-10-29 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US8170882B2 (en) 2004-03-01 2012-05-01 Dolby Laboratories Licensing Corporation Multichannel audio coding
US10403297B2 (en) 2004-03-01 2019-09-03 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US10269364B2 (en) 2004-03-01 2019-04-23 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US8983834B2 (en) 2004-03-01 2015-03-17 Dolby Laboratories Licensing Corporation Multichannel audio coding
US9691404B2 (en) 2004-03-01 2017-06-27 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US9715882B2 (en) 2004-03-01 2017-07-25 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US9704499B1 (en) 2004-03-01 2017-07-11 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US9640188B2 (en) 2004-03-01 2017-05-02 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US9691405B1 (en) 2004-03-01 2017-06-27 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
EP1914722A1 (en) 2004-03-01 2008-04-23 Dolby Laboratories Licensing Corporation Multichannel audio decoding
US9697842B1 (en) 2004-03-01 2017-07-04 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
WO2006011367A1 (ja) * 2004-07-30 2006-02-02 Matsushita Electric Industrial Co., Ltd. オーディオ信号符号化装置および復号化装置
JP2008509600A (ja) * 2004-08-03 2008-03-27 ドルビー・ラボラトリーズ・ライセンシング・コーポレーション 聴覚の情景分析を用いたオーディオ信号の結合
US7508947B2 (en) 2004-08-03 2009-03-24 Dolby Laboratories Licensing Corporation Method for combining audio signals using auditory scene analysis
KR100763919B1 (ko) * 2006-08-03 2007-10-05 삼성전자주식회사 멀티채널 신호를 모노 또는 스테레오 신호로 압축한 입력신호를 2 채널의 바이노럴 신호로 복호화하는 방법 및 장치
US8744088B2 (en) 2006-08-03 2014-06-03 Samsung Electronics Co., Ltd. Method, medium, and apparatus decoding an input signal including compressed multi-channel signals as a mono or stereo signal into 2-channel binaural signals
US9183839B2 (en) 2008-09-11 2015-11-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues
RU2493617C2 (ru) * 2008-09-11 2013-09-20 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство, способ и компьютерная программа для обеспечения набора пространственных указателей на основе сигнала микрофона и устройство для обеспечения двухканального аудиосигнала и набора пространственных указателей
US9197978B2 (en) 2009-03-31 2015-11-24 Panasonic Intellectual Property Management Co., Ltd. Sound reproduction apparatus and sound reproduction method
US10939219B2 (en) 2010-03-23 2021-03-02 Dolby Laboratories Licensing Corporation Methods, apparatus and systems for audio reproduction
US11350231B2 (en) 2010-03-23 2022-05-31 Dolby Laboratories Licensing Corporation Methods, apparatus and systems for audio reproduction
US12273695B2 (en) 2010-03-23 2025-04-08 Dolby Laboratories Licensing Corporation Methods, apparatus and systems for audio reproduction
US9008338B2 (en) 2010-09-30 2015-04-14 Panasonic Intellectual Property Management Co., Ltd. Audio reproduction apparatus and audio reproduction method
US10021500B2 (en) 2013-09-02 2018-07-10 Huawei Technologies Co., Ltd. Audio file playing method and apparatus

Also Published As

Publication number Publication date
AU2002251896B2 (en) 2007-03-22
DE60225806D1 (en) 2008-05-08
WO2002063925A3 (en) 2004-02-19
KR20030079980A (ko) 2003-10-10
CA2437764C (en) 2012-04-10
KR100904985B1 (ko) 2009-06-26
DE60225806T2 (de) 2009-04-30
EP1410686A2 (en) 2004-04-21
ATE390823T1 (de) 2008-04-15
CA2437764A1 (en) 2002-08-15
CN1524399A (zh) 2004-08-25
AU2002251896A2 (en) 2002-08-19
EP1410686B1 (en) 2008-03-26
HK1066966A1 (en) 2005-04-01
JP2004526355A (ja) 2004-08-26
WO2002063925A8 (en) 2004-03-25
MXPA03007064A (es) 2004-05-24
CN1275498C (zh) 2006-09-13

Similar Documents

Publication Publication Date Title
CA2437764C (en) Audio channel translation
AU2002251896A1 (en) Audio channel translation
US11805379B2 (en) Audio channel spatial translation
US20040062401A1 (en) Audio channel translation
US7660424B2 (en) Audio channel spatial translation
EP1527655B1 (en) Audio channel spatial translation
WO2004019656A2 (en) Audio channel spatial translation
HK1066966B (en) Method for audio channel translation
HK1073963B (en) Audio channel spatial translation
HK1164603B (en) Audio channel spatial trnslation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2002563741

Country of ref document: JP

Ref document number: 1020037010231

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 10467213

Country of ref document: US

Ref document number: 2437764

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 028046625

Country of ref document: CN

Ref document number: PA/a/2003/007064

Country of ref document: MX

WWE Wipo information: entry into national phase

Ref document number: 1017/KOLNP/2003

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2002251896

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2002720929

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020037010231

Country of ref document: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

CFP Corrected version of a pamphlet front page
CR1 Correction of entry in section i

Free format text: IN PCT GAZETTE 33/2002 DUE TO A TECHNICAL PROBLEMAT THE TIME OF INTERNATIONAL PUBLICATION, SOME INFORMATION WAS MISSING UNDER (81). THE MISSING INFORMATION NOW APPEARS IN THE CORRECTED VERSION

WWP Wipo information: published in national office

Ref document number: 2002720929

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 2002251896

Country of ref document: AU