WO2014087277A1 - Génération de signaux de commande pour transducteurs audio - Google Patents

Génération de signaux de commande pour transducteurs audio Download PDF

Info

Publication number
WO2014087277A1
WO2014087277A1 PCT/IB2013/059875 IB2013059875W WO2014087277A1 WO 2014087277 A1 WO2014087277 A1 WO 2014087277A1 IB 2013059875 W IB2013059875 W IB 2013059875W WO 2014087277 A1 WO2014087277 A1 WO 2014087277A1
Authority
WO
WIPO (PCT)
Prior art keywords
drive signal
audio
rendering
decorrelation
signal
Prior art date
Application number
PCT/IB2013/059875
Other languages
English (en)
Inventor
Jeroen Gerardus Henricus Koppens
Erik Gosuinus Petrus Schuijers
Werner Paulus Josephus De Bruijn
Arnoldus Werner Johannes Oomen
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Publication of WO2014087277A1 publication Critical patent/WO2014087277A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Definitions

  • the invention relates to generation of drive signals for audio transducers and in particular, but not exclusively, to generation of drive signals from audio signals
  • Digital encoding of various source signals has become increasingly important over the last decades as digital signal representation and communication increasingly has replaced analogue representation and communication.
  • audio content such as speech and music
  • digital content encoding is increasingly based on digital content encoding.
  • audio consumption has increasingly become an enveloping three dimensional experience with e.g. surround sound and home cinema setups becoming prevalent.
  • Audio encoding formats have been developed to provide increasingly capable, varied and flexible audio services and in particular audio encoding formats supporting spatial audio services have been developed.
  • Well known audio coding technologies like DTS and Dolby Digital produce a coded multi-channel audio signal that represents the spatial image as a number of channels that are placed around the listener at fixed positions. For a speaker setup that is different from the setup that corresponds to the multi-channel signal, the spatial image will be suboptimal. Also, these channel based audio coding systems are typically not able to cope with a number of speakers that is different from the number of speakers represented by the multi-channel signal.
  • MPEG Surround provides a multi-channel audio coding tool that allows existing mono- or stereo-based coders to be extended to multi-channel audio applications.
  • Fig. 1 illustrates an example of elements of an MPEG Surround system.
  • an MPEG Surround decoder can recreate the spatial image by a controlled upmix of the mono- or stereo signal to obtain a multichannel output signal. Since the spatial image of the multi-channel input signal is parameterized, MPEG Surround allows for decoding of the same multi-channel bit-stream by rendering devices that do not use a multichannel speaker setup.
  • An example is virtual surround reproduction on headphones, which is referred to as the MPEG Surround binaural decoding process. In this mode a realistic surround experience can be provided while using regular headphones.
  • Another example is the pruning of higher order multichannel outputs, e.g. 7.1 channels, to lower order setups, e.g. 5.1 channels.
  • MPEG standardized a format known as 'Spatial Audio Object Coding' (MPEG-D SAOC).
  • MPEG-D SAOC provides efficient coding of individual audio objects rather than audio channels.
  • each speaker channel can be considered to originate from a different mix of sound objects
  • SAOC makes individual sound objects available at the decoder side for interactive manipulation as illustrated in Fig. 2.
  • multiple sound objects are coded into a mono or stereo downmix together with parametric data allowing the sound objects to be extracted prior to the rendering thereby allowing the individual audio objects to be available for manipulation e.g. by the end-user.
  • SAOC similarly to MPEG Surround, SAOC also creates a mono or stereo downmix.
  • object parameters are calculated and included.
  • the user may manipulate these parameters to control various features of the individual objects, such as position, level, equalization, or even to apply effects such as reverb.
  • Fig. 3 illustrates an interactive interface that enables the user to control the individual objects contained in an SAOC bitstream.
  • SAOC allows a more flexible approach and in particular allows more rendering based adaptability by transmitting audio objects instead of only reproduction channels. This allows the decoder-side to place the audio objects at arbitrary positions in space, provided that the space is adequately covered by speakers.
  • SAOC This way there is no relation between the transmitted audio and the reproduction or rendering setup, hence arbitrary speaker setups can be used. This is advantageous for e.g. home cinema setups in a typical living room, where the speakers are rarely at the intended positions.
  • SAOC it is decided at the decoder side where the objects are placed in the sound scene, which is often not desired from an artistic point-of-view.
  • the SAOC standard does provide ways to transmit a default rendering matrix in the bitstream, eliminating the decoder responsibility.
  • the provided methods rely on either fixed reproduction setups or on unspecified syntax.
  • SAOC does not provide normative means to fully transmit an audio scene independently of the speaker setup.
  • SAOC is not well equipped to the faithful rendering of diffuse signal components. Although there is the possibility to include a so called multichannel background object to capture the diffuse sound, this object is tied to one specific speaker configuration.
  • 3DAA 3D Audio Alliance
  • 3DAA is dedicated to develop standards for the transmission of 3D audio, that "will facilitate the transition from the current speaker feed paradigm to a flexible object-based approach".
  • 3DAA a bitstream format is to be defined that allows the transmission of a legacy multichannel downmix along with individual sound objects.
  • object positioning data is included. The principle of generating a 3DAA audio stream is illustrated in Fig. 4.
  • the sound objects are received separately in the extension stream and these may be extracted from the multi-channel downmix.
  • the resulting multi-channel downmix is rendered together with the individually available objects.
  • the objects may consist of so called stems. These stems are basically grouped (downmixed) tracks or objects. Hence, an object may consist of multiple sub-objects packed into a stem.
  • a multichannel reference mix can be transmitted with a selection of audio objects. 3DAA transmits the 3D positional data for each object. The objects can then be extracted using the 3D positional data. Alternatively, the inverse mix-matrix may be transmitted, describing the relation between the objects and the reference mix.
  • 3DAA From the description of 3DAA, sound-scene information is likely transmitted by assigning an angle and distance to each object, indicating where the object should be placed relative to e.g. the default forward direction. This is useful for point-sources but fails to describe wide sources (like e.g. a choir or applause) or diffuse sound fields (such as ambience). When all point-sources are extracted from the reference mix, an ambient multichannel mix remains. Similar to SAOC, the residual in 3DAA is fixed to a specific speaker setup.
  • both the SAOC and 3DAA approaches incorporate the transmission of individual audio objects that can be individually manipulated at the decoder side.
  • SAOC provides information on the audio objects by providing parameters characterizing the objects relative to the downmix (i.e. such that the audio objects are generated from the downmix at the decoder side)
  • 3DAA provides audio objects as full and separate audio objects (i.e. that can be generated independently from the downmix at the decoder side).
  • FIG. 5 provides an illustration of the current high level block diagram of the intended MPEG 3D Audio system.
  • object based and scene based formats are also to be supported.
  • An important aspect of the system is that its quality should scale to transparency for increasing bitrate. This puts a burden on the use of parametric coding techniques that have been used quite heavily in the past (viz. MPEG-4 HE- AAC v2, MPEG-D MPEG Surround, MPEG-D SAOC, MPEG-D USAC).
  • Envisioned reproduction possibilities include flexible loudspeaker setups (envisaged up to 22.2 channels), virtual surround over headphones, and closely spaced speakers. Flexible loudspeaker setups refer to any number of speakers at arbitrary physical locations.
  • the decoder of MPEG 3D Audio is intended to comprise a rendering module that is responsible for translating the decoded individual audio channels/objects into speaker feeds based on the physical location of the speakers, i.e. based on the specific rendering speaker configuration/ setup.
  • the rendering of the audio is accordingly dependent on the physical locations of the speakers of the rendering configuration. These positions may be determined or provided in various ways. For example, they may simply be provided by a direct user input, such as by the user directly providing a user input indicating the floor plan of speakers location, e.g. using a mobile app interface.
  • acoustic methods both those using ultra- and audible sound
  • the acoustic methods are typically based on the concept of acoustic Time-Of- Flight, which means that the distance between any two speakers is determined by measuring the time it takes for sound to travel from one speaker to the other. This requires a microphone (or ultrasound receiver) to be integrated into each loudspeaker.
  • the positioning of the loudspeakers set within the room may also be relevant. Again this information may be provided manually or via automated methods. E.g. ultrasound reflections may be used to automatically detect the distance to room boundaries (walls, ceiling, floor) and general room dimensions. Together this information gives a full description of the rendering configuration.
  • Another requirement resulting from the speaker configuration independent audio provision is that the individual rendering device must position the different audio sources. Such positioning is traditionally performed at the content creation side, and is often manually performed or directly results from the recording signals. Furthermore, the positioning is conventionally performed based on a set of audio channels that are each associated with a fixed nominal position. Therefore, the rendering device merely needs to render the received audio signals and does not need to perform any positioning.
  • the rendering device needs to position the sound sources appropriately in the audio scene generated by the rendering of audio from the specific speaker configuration.
  • the positioning may often be based on position information received from the source, e.g. a desired position may be received for each audio object, but may be locally modified or changed.
  • the rendering device Based on the position of a given audio signal, the rendering device must generate drive signals for the individual loudspeakers which at a (nominal) listening position is then perceived to originate from the given position.
  • An approach for positioning sounds sources is to use a panning algorithm where the relative levels of the resulting drive signals for individual speakers are adjusted such that the audio signal is perceived as a sound source at the desired position.
  • two loudspeakers can radiate coherent signals with different amplitudes (except for the situation where the sound source is positioned exactly midway between the speakers). The listener perceives this as a virtual sound source positioned at a position between the speakers given by the relative amplitude levels.
  • the relation of amplitudes of emanating signals controls the perceived direction of the virtual source.
  • a virtual source can be positioned to any direction on the plane using two adjacent loudspeakers surrounding the virtual source. This method is called a pair-wise panning paradigm.
  • the loudspeaker pair need not be in front of the listener. There typically exists, however, some limitations in the effectiveness of the approach for loudspeaker placement to the side of the listener.
  • the loudspeakers should furthermore preferably both either be in front of the listener or behind the listener. If a loudspeaker configuration has loudspeakers both behind and in front of the listener, the use of such a pair of speakers result in a gap in the directions at which the virtual sources can be positioned.
  • the loudspeaker setup will include speakers that are not in the same horizontal plane, e.g. it may include elevated loudspeakers.
  • a suitable approach for 3D audio rendering is so-called Vector Base Amplitude Panning (VBAP) described in Pulkki V. Virtual source positioning using vector base amplitude panning, Journal of the Audio Engineering Society 1997; 45(6):456-466.
  • the loudspeaker setup can be divided into triangles (loudspeaker triplets), with the audio signal for a given position being positioned by a panning of one triplet.
  • a loudspeaker triplet may be formulated using vectors.
  • the unit-length vectors I m , I n and Ik point from the listening position to the loudspeakers.
  • the direction of the virtual source is presented with unit-length vector p which is expressed as a linear weighted sum of the loudspeaker vectors
  • g m , g n , and gk are called gain factors of respective loudspeakers.
  • the loudspeaker setup is divided into triangles forming a triangle set.
  • a single triangle from the set is chosen to be used for the panning.
  • the selection can be made by calculating the gain factors in each loudspeaker triangle in the triangle set and selecting the triangle that produced non-negative factors. If the triangles in the set are non-overlapping, the selection is unambiguous.
  • transmission is envisioned to be independent of the rendering speaker setup. Therefore, the received bitstream can be used for rendering to an arbitrary speaker setup.
  • the scene intended by the audio engineer is mapped to the available speakers using their actual positions.
  • receiver/decoder/renderer In practice this may result in a transmission of audio objects along with position information indicating where the object should be rendered in (3D) space.
  • position information indicating where the object should be rendered in (3D) space.
  • a multitude of algorithms is available to generate speaker signals from this information, for example Vector-Based Amplitude Panning.
  • panning between speakers that are widely spaced does not yield a well- placed source.
  • front-back confusion may arise when panning between the two surround speakers of a 5.1 configuration.
  • part of the audio is perceived at the location of the speakers.
  • many speaker configurations such as e.g. a 5.1 loudspeaker configuration, utilize speakers that are relatively far apart and which accordingly provide a suboptimal perception of the virtual sound source at the desired position.
  • Panning between two speakers introduces a sweet-spot, or in fact a 'sweet-plane', which is the plane where the distance to both speakers is equal.
  • a 'sweet-plane' which is the plane where the distance to both speakers is equal.
  • this 'sweet-plane' becomes a vertical 'sweet-line'.
  • elevated speakers are used to pan elevated objects the sweet-spot is also limited in height. This is even more problematic than the 'sweet-line' since people are generally not equally tall and therefore not listening at the same height.
  • Solutions based on crosstalk cancelation can be used to introduce improved localization cues at the ears of the listener.
  • approaches are complex, sensitive to imperfections, have a narrow sweet-spot due to phase manipulation, and require personalized components in order to work well.
  • an improved approach would be advantageous and in particular an approach allowing increased flexibility, improved positioning of audio sources, improved adaptability to different rendering configurations, reduced complexity, an improved user experience, and/or improved performance would be advantageous.
  • the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
  • an apparatus for generating drive signals for audio transducers comprising: an audio receiver for receiving an audio signal; a position receiver for receiving position data indicative of a desired rendering position for the audio signal; a drive signal generator for generating at least a first drive signal for a first audio transducer associated with a first position and a second drive signal for a second audio transducer associated with a second position, the drive signal generator being arranged to generate the drive signals in response to a panning for the audio signal in response to the desired rendering position; and wherein the drive signal generator is arranged to decorrelate the first drive signal relative to the second drive signal, a degree of decorrelation being dependent on an indication of the first position.
  • the invention may provide an improved audio experience, and in particular an improved spatial audio experience.
  • the approach may support rendering over a wide range of loudspeaker configurations with increased adaptability of the user experience to the given configuration.
  • an improved perception of a sound source at a desired position may be provided, and often with a reduced sensitivity to specific loudspeaker configurations.
  • improved performance may be achieved for loudspeaker configurations having a relatively large distance between loudspeakers.
  • the approach may in many scenarios result in mitigation of imperfections of a panning operation.
  • the perception of a given sound source as also originating from the positions of the speakers involved in the panning may be reduced substantially.
  • the approach may specifically reduce the correlation between the speaker signals used to generate a panned phantom source thereby reducing the perceptibility of imperfections of the panning operation. For example, using panning for localization between widely spaced speakers tends to result in artifacts, including the perception of additional sound sources at the speaker positions.
  • the sound source is rendered more diffusely but still with a directional component originating from the desired position. It is often preferable to have a perceived sound source which is perceived as coming from a less-defined, but still more or less correct direction, than to have a sound source which is perceived as, for example, coming from two distinct loudspeaker positions or from a completely wrong position (e.g. front-back reversal).
  • the panning operation may comprise and/or consist in setting relative levels and/or gains for the first and second drive signal in response to the desired rendering position.
  • the levels/ gains may be set such that the audio signal will be perceived to originate from the desired rendering position at a (nominal) listening position.
  • the desired rendering position may be a three dimensional, two dimensional or one dimensional position.
  • the panning operation may be a three dimensional, two dimensional or one dimensional position.
  • a three dimensional system may consider both a horizontal angular direction (azimuth), a vertical angular direction (elevation), and a distance from a (nominal) listening position.
  • a two dimensional system may e.g.
  • a one dimensional system may e.g. consider only a horizontal angular direction (azimuth).
  • the desired rendering position may be an angular direction (azimuth) from a (nominal) listening position.
  • the apparatus may be arranged to receive audio transducer position data indicative of the positions of the first and second audio transducers, i.e. to receive an indication of at least the first position.
  • the data may e.g. be received from an internal source (such as a memory), a user input, or a remote source.
  • the audio signal may be received from an internal or external source.
  • the desired rendering position may also be received from any internal or external source, and may for example be received from a remote source together with the audio signal, or may be locally provided or generated.
  • the first position may be a three dimensional, two dimensional, or one dimensional position.
  • the second position may be a three dimensional, two dimensional, or one dimensional position.
  • the first position may be represented by any indication of a position including a three dimensional, two dimensional, or three dimensional position indication.
  • the first (and/or second) position may be represented by an angular direction (azimuth) from a (nominal) listening position.
  • the position receiver may receive an indication of the first position (from an external or internal source), and the drive signal generator may determine the degree of decorrelation dependent on the indication of the first position.
  • the indication of the first position may be an indication of an absolute position or may e.g. be an indication of a relative position, such as an indication of the first position relative to the second position and/or to a listening position.
  • the indication of the first position may be a partial indication of the first position (e.g. may only provide an indication in one dimension, such as an indication of an angle from a listening position to the first position, e.g. relative to a reference direction).
  • the audio signal may for example be an audio object, audio scene, audio channel or audio component.
  • the audio signal may be part of a set of audio signals, such as e.g. an audio component in an encoded data stream comprising a plurality of (possibly different types of) audio items.
  • the degree of decorrelation is dependent on an indication of the first position relative to the second position.
  • This may provide improved rendering in many embodiments, and may in particular allow efficient and accurate adaptation of the characteristics of the rendering to the specific audio transducer configuration.
  • the relative positions of audio transducers involved in a panning operation may have a strong influence on the performance, accuracy and possible artifacts of the operation, and thus an adaptation of the decorrelation based on a measure of a relative positioning of the audio transducers may provide a particularly suitable adaptation of the rendering.
  • the dependency of the degree of decorrelation on the (indication of) the first position may specifically be a dependency on the (indication of) the first position relative to the second position.
  • the indication of the first position relative to the second position may for example be an indication of the difference between the positions, e.g. measured as a distance along a line between the first and second position, or as an angular distance measured relative to a (typically nominal) listening position.
  • the degree of decorrelation is dependent on an indication of an angle between a direction from a listening position to the first position and a direction from the listening position to the second position.
  • This may provide improved rendering in many embodiments, and may in particular allow efficient and accurate adaptation of the characteristics of the rendering to the specific audio transducer configuration.
  • difference/distance to audio transducers involved in a panning operation from a listening position may have a strong effect on the performance, accuracy and possible artifacts of the operation, and thus an adaptation of the decorrelation based on a measure of the angular difference/ distance may provide a particularly suitable adaptation of the rendering.
  • the dependency of the degree of decorrelation on the (indication of) the first position may specifically be a dependency on the (indication of) the angle between a direction from a listening position to the first position and a direction from the listening position to the second position.
  • the degree of decorrelation of the first drive signal relative to the second drive signal is dependent on an indication of a distance between the first position and the second position.
  • This may provide an improved adaptation to specific audio transducer configurations, and may in particular allow an improved trade-off between degradations resulting from imperfect panning and the definiteness of the perceived sound source position.
  • An improved user experience is typically provided with the localization effect being adapted to the specific audio transducer setup.
  • the distance may be an angular distance.
  • the angular distance may be measured from a (nominal) listening position.
  • the drive signal generator is arranged to increase decorrelation for an indication of increasing distance.
  • the distance may be an angular distance.
  • the angular distance may be measured from a (nominal) listening position.
  • the drive signal generator may be arranged to increase decorrelation for increasing angular distance between the first and second positions. The degree of
  • decorrelation may specifically be a monotonically increasing function of the distance.
  • the drive signal generator is arranged to only decorrelate the first drive signal relative to the second drive signal when the indication of the distance is indicative of a distance above a threshold.
  • the threshold may in many embodiments advantageously correspond to an angular difference (from a nominal listening position) belonging to the interval of [45°;75°]; [50°;70°], or [55°;65°], and may specifically advantageously be substantially 60°.
  • the degree of decorrelation of the first drive signal relative to the second drive signal is dependent on an indication of a distance between the desired rendering position and at least one of the first position and the second position.
  • This may provide an improved adaptation to specific audio transducer configurations, and may in particular allow an improved trade-off between degradations resulting from imperfect panning and a degree of localization.
  • An improved user experience is typically provided with the localization effect being adapted to the specific audio transducer setup and position being rendered.
  • the distance may be an angular distance.
  • the angular distance may be measured from a (nominal) listening position.
  • the degree of decorrelation is dependent on a distance between the desired rendering position and a closest speaker position of at least one of the first position and the second position.
  • the drive signal generator is arranged to increase decorrelation for an indication of increasing distance.
  • the distance may be an angular distance.
  • the angular distance may be measured from a (nominal) listening position.
  • the drive signal generator may be arranged to increase decorrelation for increasing angular distance between the desired rendering position and at least one of the first and second positions.
  • the degree of decorrelation may specifically be a monotonically increasing function of the distance.
  • the degree of decorrelation may be increased for an increasing distance between the desired rendering position and a closest speaker position of at least one of the first position and the second position.
  • the drive signal generator furthermore comprises a frequency response modifier arranged to modify a frequency response for at least the first drive signal in response to the desired rendering position.
  • This may provide an improved rendering in many embodiments and may in particular allow improved direction perception by a listener.
  • the feature may allow improved back to front resolution in many scenarios.
  • the modification of the frequency response is dependent on an ear response for a direction from a listening position to the desired rendering position.
  • This may provide an improved rendering in many embodiments and may in particular allow improved direction perception by a listener.
  • the feature may allow improved back to front resolution in many scenarios.
  • the modification of the frequency response may specifically be dependent on an ear response for a direction from the listening position to the desired rendering position relative to a reference direction, e.g. corresponding to a nominal listener orientation.
  • the drive signal generator furthermore comprises a frequency response modifier arranged to modify a frequency response for at least the first drive signal dependent on the first position.
  • This may provide an improved rendering in many embodiments and may in particular allow improved direction perception by a listener.
  • the feature may allow improved back to front resolution in many scenarios.
  • different frequency equalization/ coloration may be used for different speakers.
  • the drive signal generator may further comprise means arranged to modify a frequency response for the second drive signal dependent on the first position.
  • the degree of decorrelation of the first drive signal relative to the second drive signal is dependent on an angular direction from a listening position to the desired rendering position relative to a reference direction.
  • the reference direction may typically be a listening direction, such as a nominal forward direction of a listener at a nominal listening position.
  • the signal generator is further arranged to generate a third drive signal for a third audio transducer associated with a third position in response to the panning operation for the audio signal in response to the desired rendering position; and drive signal generator is arranged to decorrelate the third drive signal relative to first drive signal and to decorrelate the third drive signal relative to the second drive signal.
  • the signal generator comprises a decorrelator for decorrelating the first drive signal relative to the second drive signal.
  • a method of generating drive signals for audio transducers for rendering an audio signal comprising: receiving the audio signal; receiving position data indicative of a desired rendering position for the audio signal; generating at least a first drive signal for a first audio transducer associated with a first position and a second drive signal for a second audio transducer associated with a second position, the drive signals being generated in response to a panning for the audio signal in response to the desired rendering position; and wherein generating the first drive signal comprises decorrelating the first drive signal relative to the second drive signal, a degree of decorrelation being dependent on an indication of the first position.
  • Fig. 1 illustrates an example of elements of an MPEG Surround system in accordance with the prior art
  • Fig. 2 exemplifies the manipulation of audio objects possible in MPEG SAOC
  • Fig. 3 illustrates an interactive interface that enables the user to control the individual objects contained in an SAOC bitstream
  • Fig. 4 illustrates an example of the principle of audio encoding of 3DAA in accordance with the prior art
  • Fig. 5 illustrates an example of the principle of audio encoding envisaged for MPEG 3D Audio in accordance with the prior art
  • Fig. 6 illustrates an example of an audio rendering system in accordance with some embodiments of the invention
  • Fig. 7 illustrates an example of a loudspeaker rendering configuration
  • Fig. 8 illustrates an example of an audio rendering unit in accordance with some embodiments of the invention
  • Fig. 9 illustrates an example of an audio rendering unit in accordance with some embodiments of the invention.
  • Fig. 10 illustrates an example of an audio rendering unit in accordance with some embodiments of the invention
  • Fig. 11 illustrates an example of a three speaker rendering configuration
  • Fig. 12 illustrates an example of an audio rendering unit in accordance with some embodiments of the invention
  • Fig. 13 illustrates an example of a panning operation processing for rendering of an audio signal in accordance with some embodiments of the invention
  • Fig. 14 illustrates an example of ear frequency responses for audio signals from different directions.
  • Fig. 15 illustrates an example of a frequency response modification for rendering of an audio signal in accordance with some embodiments of the invention.
  • Fig. 6 illustrates an example of an audio renderer in accordance with some embodiments of the invention.
  • the audio renderer comprises an audio receiver 601 which is arranged to receive audio data for audio that is to be rendered.
  • the audio data may be received from any internal or external source.
  • the audio data may be received from any suitable communication medium including direct communication or broadcast links.
  • communication may be via the Internet, data networks, radio broadcasts etc.
  • the audio data may be received from a physical storage medium such as a CD, Blu-RayTM disc, memory card etc.
  • the audio data may generated locally, e.g. by a 3D audio model (as e.g. used by a gaming application).
  • the audio data comprises a plurality of audio components which may include audio channel components associated with a specific rendering loudspeaker configuration (such as a spatial audio channel of a 5.1 surround signals) or audio objects that are not associated with any specific rendering loudspeaker configuration.
  • a specific rendering loudspeaker configuration such as a spatial audio channel of a 5.1 surround signals
  • audio objects that are not associated with any specific rendering loudspeaker configuration.
  • the audio signal is specifically one component of the received audio data, and indeed the following description will focus on a rendering of an audio object of the received audio data.
  • the described approach may be used with other audio components, including for example audio channels and audio signals extracted e.g. from audio channels (e.g. corresponding to individual sound sources embedded in the audio channels).
  • audio channels e.g. corresponding to individual sound sources embedded in the audio channels.
  • rendering of other audio signals may be performed in parallel and these audio signals may be rendered simultaneously from the same loudspeakers, and indeed the rendering of these other audio signals may follow the same approach.
  • all received audio components of the received audio data will be rendered in parallel thereby generating an audio scene represented by the audio data. It will also be appreciated that the described approach may only be applied to some of the audio components, or may indeed be applied to all received audio components.
  • the audio renderer of Fig. 6 further comprises a position receiver 603 which is arranged to receive position data which is indicative of a desired rendering position for the audio signal.
  • a single data stream may be received, e.g. via the Internet, with the single data stream comprising a number of audio signals defining audio objects and position data defining a recommended rendering position for each of the audio objects.
  • the following description will focus on an example wherein an audio signal corresponding to an audio object is rendered such that it is perceived to originate at a desired position indicated by position data received together with the audio signal.
  • the audio signal may be rendered at other positions.
  • the position indicated by the received position data may be modified locally, e.g. in response to a manual user input.
  • the desired position at which the audio renderer tries to render the audio signal may be determined by a local modification or manipulation of the received indicated position.
  • the position data may not be received with the audio data but may be received from another source, including both external and internal sources.
  • the audio renderer may include a position processor which automatically or in response to user inputs generates desired positions for various audio objects.
  • Such an embodiment may be particularly suitable for scenarios wherein the audio data is also locally generated. For example, for a gaming or virtual world application, a three dimensional model may be generated and used to generate both audio signals and associated positions.
  • the audio receiver 601 and position receiver 603 are coupled to a rendering unit 605 which is arranged to generate signals for individual audio transducers.
  • the rendering unit 605 generates one signal for each of the audio transducers, and thus the output set of signals comprises one individual signal for each audio transducer of a set of audio transducers.
  • the system of Fig. 6 renders the audio using a plurality of audio transducers in the form of a set of loudspeakers 607, 609 that are (assumed to be) arranged in a given speaker configuration.
  • Fig. 7 illustrates an example of speaker configuration comprising five speakers, namely a center speaker C, a left front speaker L, a right front speaker R, a left surround (or rear) speaker LS, and a right surround (or rear) speaker RS.
  • the speakers are in this example positioned at positions in a circle around a listening position.
  • the speaker configuration is in the example referenced to a listening position and furthermore to a listening orientation.
  • a nominal listening position and orientation is assumed for the rendering.
  • the rendering seeks to position the audio signal such that it, for a listener positioned at the nominal listening position and with the nominal listening orientation, will be perceived to originate from a sound source in the desired direction.
  • the positions may specifically be positions that are defined with respect to the (nominal) listening position and to the (nominal) listening orientation. In many embodiments, positions may only be considered in a horizontal plane, and distance may often be ignored. In such examples, the position may be considered as a one- dimensional position which is given by an angular direction relative to the reference direction which is given as a specific direction from the listening position.
  • the reference direction may typically correspond to the direction assumed to be directly in front of the nominal listener, i.e. to the forward direction. Specifically, in Fig. 7, the reference direction is that from the listening position to the front center speaker C.
  • the angle between the reference direction and the direction from the listening position to a given speaker will simply be referred to as the angular position of the speaker.
  • the angular position of the front speakers are at ⁇ 30°, and the angular position of the rear speakers are at ⁇ 110°.
  • the angular distance between the front right speaker R and the surround right speaker RS is 80°.
  • Fig. 7 corresponds to a five channel surround sound configuration.
  • other loudspeaker rendering configurations may be used, including for example a larger number of speakers, elevated speakers, asymmetric speaker locations etc.
  • the rendering unit 605 is arranged to render the audio signal to be perceived to originate from the desired position.
  • the desired position is given as an angle with respect to the reference (forward) direction from the listening position to the center speaker C.
  • the desired position is given as an angle with respect to the reference (forward) direction from the listening position to the center speaker C.
  • the rendering unit 605 is arranged to position the audio signal at a sound source position using a panning operation.
  • the positioning of the audio signal is specifically by panning using the two nearest speakers.
  • the rendering unit 605 will perform a panning between the center speaker C and the right speaker R, for an angle in the interval of [0;-30°]
  • the rendering unit 605 will perform a panning between the center speaker C and the left speaker L, for an angle in the interval of [30°; 110°]
  • the rendering unit 605 will perform a panning between the right speaker R and the rear surround speaker RS, for an angle in the interval of [-30°; -110°]
  • the rendering unit 605 will perform a panning between the left speaker L and the left surround speaker LS, and for an angle in the interval of [-110°,- 180°] or [110°,180°]
  • the rendering unit 605 will perform a panning between the left surround speaker LS and the rear surround speaker RS.
  • the rendering unit 605 is arranged to select the two speakers nearest to the desired position and to position the audio signal between the two speakers using panning.
  • the two selected speakers are illustrated as speakers 607, 609 which e.g. may represent any speaker pair of Fig. 7 as described above.
  • the rendering unit 605 comprises a panning processor 611 which is arranged to perform a panning operation in order to generate output signals that when rendered will result in the audio signal being perceived by a listener at the nominal listening position to predominantly originate from the desired position.
  • the panning operation specifically determines relative signal levels for the sound rendered from the first speaker 607 and the second speaker 609 of the selected speaker pair 607, 609.
  • the panning includes determining a relative level difference between the first drive signal and the second drive signal corresponding to the desired rendering position.
  • the amplitude gains are determined for the first and second speaker's driver signals by means of so called panning.
  • Panning is a process where depending on the position of a virtual sound source between two or more speakers, the signal corresponding to the virtual sound source position is played over these speakers, with amplitude gains that are determined based on relative distance of the virtual sound source with respect to the speakers.
  • the amplitude gain for the driver signal corresponding to that speaker will be relatively high, e.g. 0.9, whereas the gain for the second speaker will be relatively low, e.g. 0.1, thereby creating the impression of a virtual sound source between the first and second speaker, close to the first speaker.
  • the panning processor 611 is coupled to the audio receiver 601 and the position receiver 603 and receives the audio signal and the desired position. It then proceeds to generate a signal for each of the first and second speaker 607, 609, i.e. the panning processor 611 generates two signals from the audio signal, namely one for the first speaker 607 and one for the second speaker 609. The two generated signals have an amplitude value which when rendered from the positions of the first and second speaker 607, 609 corresponds to an audio source perceived to be at the desired position.
  • the first signal is used to drive the first speaker 607 and the second signal is used to drive the second speaker 609. It will be appreciated that the first and/or second signals may in some embodiments directly be used to drive the loudspeakers 607, 609 but in many embodiments the signal paths may include further processing including for example amplification, filtering, impedance matching etc.
  • the rendering unit 605 of Fig. 6 is furthermore arranged to decorrelate one of the signals relative to the other signal.
  • the decorrelation may be performed prior to the panning, as part of the panning operation, or after the panning operation.
  • the rendering unit 605 may initially generate a decorrelated version of the audio signal and then generate the first and second signals by a gain adjustment of the original audio signal and the decorrelated version of this.
  • the panning operation is performed on the audio signal thereby generating the first and second signal.
  • the decorrelation is then performed by applying a decorrelation to the first signal.
  • the rendering unit 605 comprises a decorrelator 613 which decorrelates the first signal.
  • the decorrelator 613 is a processing component that generates an output signal that preserves the spectral and temporal amplitude envelopes of the input signal but has a cross-correlation of less than one between the input and output.
  • Many practical decorrelators will have a cross-correlation of close to zero between the input and the output.
  • the decorrelator 613 of Fig. 6 is an adjustable decorrelator for which the degree of correlation may be varied.
  • the decorrelator 613 may provide a partial and variable decorrelation thus allowing the input signal and output signal to have a cross-correlation of lower than one but possibly higher than zero.
  • panning is highly suitable for positioning virtual sound sources in positions between physical loudspeaker positions, it can also introduce artifacts in some scenarios. Specifically, if the distance between the speakers is too large, the listener will in addition to the desired phantom sound source position also tend to perceive secondary sound sources at the positions of the used loudspeakers. Indeed, for typical speaker distances, the panning operation will tend to result in clearly noticeable artifacts.
  • the inventors have specifically realized that it is often preferable to have a sound source which is perceived as coming from a less-defined, but still more or less correct direction, than to have a source which is perceived, e.g., as coming from two distinct loudspeaker positions or from a wrong position (e.g. front-back reversal).
  • the degree of decorrelation is dependent on the rendering speaker configuration. Specifically, it will depend on at least the position of one of the speakers, and typically on the positions of two speakers relative to each other.
  • the rendering unit 605 may receive information of the position of the speakers, e.g. simply provided as an azimuth relative to a nominal direction (typically the forward direction). In some embodiments, such information may be received by the rendering unit 605 from a measurement unit that performs measurements to determine relative positions. Alternatively or additionally, the position information may be received as a user input, e.g. by the user simply inputting the approximate angle from the listening position to each of the speakers relative to a forward direction. In yet other embodiments, the position information may be stored in the audio renderer and simply retrieved from memory by the rendering unit 605. In some embodiments, the position information may be assumed position information.
  • the degree of decorrelation is dependent on a distance between the positions of the first and second loudspeakers 607, 609, i.e. between the speakers used for the panning.
  • the distance may be determined as an actual distance, e.g. along a desired direction. However, in many scenarios, the distance may be determined as an angular distance measured from the (nominal) listening position.
  • the degree of decorrelation may be dependent on the angle between the lines from the listening position to the two loudspeakers 607, 609.
  • the system may specifically be arranged to only introduce additional decorrelation if the distance exceeds a threshold. Indeed, for an angular distance sufficiently low, it has been found that panning provides a very accurate perception with only very low and typically insignificant artifacts. It has been found that in many scenarios and for most people, loudspeakers that are at an angle of less than 60° provide an accurate perception of a sound source at the desired position and with typically insignificant degradations.
  • the rendering unit 605 may be arranged to increase decorrelation for increasing distance, or equivalently the rendering unit 605 may determine the decorrelation amount as a
  • the rendering unit 605 thus generates a desired correlation ( ⁇ ) between the speaker signals.
  • This desired correlation is dependent on the distance between the speakers between which a source is panned, where a lower correlation (and thus higher decorrelation) is chosen for wider spaced speakers.
  • Speaker spacing is never larger than 180° as there for large angles will exist a smaller alternative angle describing the same configuration. With panning on a speaker spacing of 180° there is only lateralization, therefore a corresponding correlation of zero may be used for this spacing.
  • a suitable function may be selected, such as e.g. a linear interpolation:
  • rendering unit 605 may be implemented in different ways in different embodiments.
  • Fig. 8 illustrates an example of an implementation of the rendering unit 605.
  • a decorrelator which performs a full decorrelation (i.e. with a cross correlation between input and output of substantially zero) is used.
  • the audio signal to be panned between two speakers 607, 609 is first split and decorrelated in order to achieve a desired correlation level before panning gains are applied to the two signals.
  • the degree of decoration is controlled by the first drive signal being generated as a weighted summation of the original audio signal and the fully decorrelated signal.
  • the relation between the decorrelation gains ( ⁇ , ⁇ , a 2 ) may for example be chosen to preserve signal energy.
  • ⁇ ⁇ cos(arccos(/l))
  • a 2 sin(arccos(/l)) ' where ⁇ indicates the desired correlation ( ⁇ e [ ⁇ , ... ,l]) between the two speaker signals.
  • the panning is then performed by scaling the resulting signals using appropriate panning gains ( ⁇ , ⁇ 2 ).
  • An advantage of the example of Fig. 8 is that no adjustments need to be made to the panning gains ( ⁇ , ⁇ 2 ) obtained from a panning algorithm which does not consider any decorrelation (e.g. from a conventional algorithm for determining panning gains).
  • Fig. 9 illustrates another example wherein the panning gains are applied before decorrelation.
  • the decorrelation gains ( ⁇ , ⁇ , a 2 ) should be chosen differently to preserve the correct energy in the output signals.
  • the decorrelation and panning may be performed jointly in a matrix operation on the audio signal and a decorrelated version thereof:
  • the degree of decorrelation may additionally or alternatively be dependent on the desired rendering position.
  • the decorrelation may be dependent on a relative distance from the rendering position to the nearest loudspeaker used for the panning.
  • the rendering unit 605 may increase the decorrelation for an increasing distance, or equivalently the amount to decorrelation may be a monotonically increasing function of the distance from the desired rendering position to the nearest speaker position.
  • a sound source panned closely to one of the speakers will have higher correlation than for a sound source panned halfway between the speakers.
  • Such an approach may provide an improved user experience in many scenarios.
  • the approach may reflect that panning works better for positions close to the speaker positions used in the panning than when further apart.
  • the degree of diffuseness in the rendering of the audio signal is adapted to reflect the degree of artifacts, thereby automatically achieving that the artifacts are obscured in dependence on the significance of the artifacts.
  • the amount of decorrelation may depend on the direction towards the desired rendering position with respect to a reference direction.
  • a nominal listening position and nominal front direction may be defined.
  • the nominal front direction may be from the listening position to the center speaker C.
  • the amount of decorrelation may be varied dependent on the angle between this frontal direction and the direction towards the desired rendering position.
  • the frontal direction may be assumed to correspond to the way a user is facing when listening to the rendering sound.
  • the rendering unit 605 may in such an embodiment introduce a higher decorrelation for a desired rendering position at an angle of 90° than for an angle of 0°.
  • more decorrelation may be introduced for desired rendering positions that are typically to the side of the user than for desired rendering positions that are typically in front or to the back of the user.
  • the system may automatically adjust for variations in the degree of degradation that is perceived as a function of the rendering position with respect to the user.
  • the approach be used to adapt the rendering to reflect that interpolation by the human brain tends to be more reliable for sound sources in the front or to the back of the listener and less reliable for rendering positions to the sides of the listener.
  • the exact amount of decorrelation may depend on a plurality of factors.
  • an algorithm for determining the desired decorrelation may take into account both the distance between the speakers, the distance from the desired rendering position to the nearest speaker position, as well as to whether the desired rendering position is in front or to the side of the listener.
  • the panning operation may include more than two speakers thereby allowing positioning in more dimensions.
  • panning not only takes place in the horizontal plane, but can also be performed in the vertical direction.
  • Such an approach may specifically use three speakers instead of two as shown in Fig.11.
  • three correlations are relevant and in order to control these, two (possibly) partial decorrelations can be applied as shown in Fig. 12.
  • the combined panning and decorrelation gains may be determined as:
  • the exemplary approach of Fig. 12 uses the input signal (S) for one driver and decorrelates the two other driver signals from the input signal accordingly.
  • Another approach may reorder the rows of matrix R 3 such that the speaker with most energy is fed the scaled input signal (i.e. the first row is permutated to the speaker signal (1, 2 or 3) that has the highest panning gain ⁇ ⁇ ).
  • An advantage of this approach is that the loudest output signal does not contain any decorrelator signal.
  • a decorrelator signal may introduce artifacts affecting audio quality.
  • the remaining rows may be permutated such that with decreasing energy more decorrelation signal energy is added.
  • the amount of decorrelation applied to a drive signals may depend on an energy of the drive signals, and in particular the decorrelation applied to a drive signal may depend on an energy of the drive signal relative to an energy of another drive signal. In some embodiments, no decorrelation is applied to the drive signal having a highest energy
  • R 3 defines three vectors describing the individual speaker signals:
  • the final degree of freedom i.e., rotation around the s axis may be used to ensure maximum signal continuity, e.g. by aligning the signals such that the vector with the maximum length of the three vectors z l s z 2 and Z3 is always associated to a single decorrelator, e.g. di.
  • the contribution of one of the decorrelators may be minimized by rotating the vectors z l s z 2 and Z3 around the s axis. This can be used beneficially to reduce the complexity of the least present decorrelator.
  • the rendering unit 605 may be arranged to modify the frequency response for at least one of the first and second signals dependent on the desired rendering position.
  • the transfer function representing the signal path from the audio signal (or decorrelated audio signal) to the drive signal for the loudspeaker may be dependent on the desired rendering position.
  • the frequency response may be modified to reflect an ear response.
  • the (front-back) asymmetry of a person's head and specifically ears introduce a frequency selective variation that depends on the direction from which the sound is received.
  • ear and head shadowing may introduce a frequency response dependent on the direction from which the sound is received.
  • the rendering unit 605 may emulate elements of such a frequency variation to improve the position perception for the listener.
  • the frequency response may alternatively or additionally be modified dependent on the position of the loudspeakers.
  • the frequency response may be dependent on the angle between the speakers and a reference direction, which may specifically be a direction corresponding to a (nominal) forward direction.
  • the frequency response may accordingly be different for speakers at different positions.
  • the frequency response may be dependent on both the desired rendering position and the speaker positions.
  • equalization may be applied to account for coloration differences due to speaker positions vs. intended source position.
  • sources in the back rendered with the surround speakers of a 5.1 configuration may benefit from a lowered level of high frequencies to account for increased head- and ear-shadowing for rear sources compared to shadowing from the position of the speakers.
  • colorization may also be applied to improve the perception of a virtual sound source. For example, for the
  • a virtual phantom center back speaker can be realized by playing a coherent (or decorrelated) sound through both the left and right surround speakers.
  • a virtual back speaker is perceived in front of the listener (known as so called front-back confusion).
  • front-back confusion One of the effects that cause this front-back confusion is a mismatch in the spectral cues between an actual center back speaker and the phantom sound source.
  • the frequency modification applied by the head and ear shadowing for sounds arriving from the back of the listener is not present for either of the sounds arriving from the surround speakers since these are substantially to the side of the user.
  • this effect can be emulated thereby reducing the risk of front-back confusion.
  • a position dependent filtering may be applied to the signals for the speakers.
  • the speaker signal is filtered with a filter h ⁇ p spkx , p kY , p obj ) (e.g. an FIR filter) to obtain a processed speaker signal.
  • the filter h ⁇ p spkx , p kY , p obj ) is a function of the actual speaker positions p spkx , p spkY and the object/virtual channel position, i.e. the desired rendering position.
  • the filter may be tabulated or may be parameterized.
  • Fig. 14 shows the ear responses for a physical center rear source, a phantom rear source and a physical center front source.
  • the coloration of the phantom source is clearly different from both physical sources, and clearly contains more high frequency content than the physical rear source. This may give rise to front-back confusions.
  • Fig. 15 illustrates the difference between the coloration of the phantom source and the physical rear source. This may be used for coloration compensation to compensate for the differences between the physical speaker and the phantom source. The compensation will vary with the position of the phantom source and the position of the physical sources used to create the phantom source.
  • the invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
  • the invention may optionally be
  • an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units, circuits and processors.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Stereophonic System (AREA)

Abstract

Selon la présente invention, un appareil génère des signaux pour transducteurs audio à partir d'un signal audio de telle sorte que le signal audio peut être rendu de manière spatiale. L'appareil comprend un récepteur (603) qui reçoit des données de position indiquant une position de rendu désiré pour le signal audio. Un générateur (605) de signal de commande génère un premier signal de commande pour un premier transducteur audio associé à une première position et un second signal de commande pour un second transducteur audio associé à une seconde position. Les premier et second signaux sont générés à l'aide d'une opération panoramique pour le signal audio sur la base de la position de rendu désiré. Le générateur (605) de signal de commande décorrèle de plus le premier signal de commande par rapport au second signal de commande. Le degré de décorrélation dépend de la première position et peut de manière spécifique dépendre de la distance (y compris une distance angulaire) des haut-parleurs et/ou de la position de rendu désiré. La présente invention peut réduire la perceptibilité d'artefacts introduits en raison du panoramique.
PCT/IB2013/059875 2012-12-06 2013-11-04 Génération de signaux de commande pour transducteurs audio WO2014087277A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261733971P 2012-12-06 2012-12-06
US61/733,971 2012-12-06

Publications (1)

Publication Number Publication Date
WO2014087277A1 true WO2014087277A1 (fr) 2014-06-12

Family

ID=49641813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/059875 WO2014087277A1 (fr) 2012-12-06 2013-11-04 Génération de signaux de commande pour transducteurs audio

Country Status (1)

Country Link
WO (1) WO2014087277A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016040623A1 (fr) * 2014-09-12 2016-03-17 Dolby Laboratories Licensing Corporation Rendu d'objets audio dans un environnement de reproduction qui comprend des haut-parleurs d'ambiance et/ou en hauteur
KR20170094078A (ko) * 2016-02-08 2017-08-17 소니 주식회사 오디오 공간 효과를 위한 초음파 스피커 어셈블리
EP3220657A1 (fr) * 2016-03-16 2017-09-20 Sony Corporation Haut-parleur à ultrasons pour mappage de chambre
TWI607655B (zh) * 2015-06-19 2017-12-01 Sony Corp Coding apparatus and method, decoding apparatus and method, and program
CN110431853A (zh) * 2017-03-29 2019-11-08 索尼公司 扬声器设备、音频数据提供设备以及音频数据再现系统
CN111800731A (zh) * 2019-04-03 2020-10-20 雅马哈株式会社 声音信号处理装置以及声音信号处理方法
CN114846821A (zh) * 2019-12-18 2022-08-02 杜比实验室特许公司 音频设备自动定位

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BREEBAART J ET AL: "Background, concept, and architecture for the recent MPEG surround standard on multichannel audio compression", JOURNAL OF THE AUDIO ENGINEERING SOCIETY, AUDIO ENGINEERING SOCIETY, NEW YORK, NY, US, vol. 55, no. 5, 1 May 2007 (2007-05-01), pages 331 - 351, XP008099918, ISSN: 0004-7554 *
KENDALL G S: "THE DECORRELATION OF AUDIO SIGNALS AND ITS IMPACT ON SPATIAL IMAGERY", COMPUTER MUSIC JOURNAL, CAMBRIDGE, MA, US, vol. 19, no. 4, 1 January 1995 (1995-01-01), pages 71 - 87, XP008026420, ISSN: 0148-9267 *
KHOURY S ET AL: "Volumetric modeling of acoustic fields in CNMAT's sound spatialization theatre", VISUALIZATION '98. PROCEEDINGS RESEARCH TRIANGLE PARK, NC, USA 18-23 OCT. 1998, PISCATAWAY, NJ, USA,IEEE, US, 1 January 1998 (1998-01-01), pages 439 - 442, XP031172561, ISBN: 978-0-8186-9176-8, DOI: 10.1109/VISUAL.1998.745338 *
PULKKI V.: "Virtual source positioning using vector base amplitude panning", JOURNAL OF THE AUDIO ENGINEERING SOCIETY, vol. 45, no. 6, 1997, pages 456 - 466, XP002719359
WILLIAM G GARDNER: "3-D Audio Using Loudspeakers", 1 September 1997 (1997-09-01), Massachusetts Institute of Technology, pages 1 - 153, XP055098835, Retrieved from the Internet <URL:http://sound.media.mit.edu/Papers/gardner_thesis.pdf> [retrieved on 20140128] *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106688253A (zh) * 2014-09-12 2017-05-17 杜比实验室特许公司 在包括环绕扬声器和/或高度扬声器的再现环境中呈现音频对象
WO2016040623A1 (fr) * 2014-09-12 2016-03-17 Dolby Laboratories Licensing Corporation Rendu d'objets audio dans un environnement de reproduction qui comprend des haut-parleurs d'ambiance et/ou en hauteur
US20170289724A1 (en) * 2014-09-12 2017-10-05 Dolby Laboratories Licensing Corporation Rendering audio objects in a reproduction environment that includes surround and/or height speakers
US11170796B2 (en) 2015-06-19 2021-11-09 Sony Corporation Multiple metadata part-based encoding apparatus, encoding method, decoding apparatus, decoding method, and program
TWI607655B (zh) * 2015-06-19 2017-12-01 Sony Corp Coding apparatus and method, decoding apparatus and method, and program
KR101880844B1 (ko) * 2016-02-08 2018-07-20 소니 주식회사 오디오 공간 효과를 위한 초음파 스피커 어셈블리
KR20170094078A (ko) * 2016-02-08 2017-08-17 소니 주식회사 오디오 공간 효과를 위한 초음파 스피커 어셈블리
EP3220657A1 (fr) * 2016-03-16 2017-09-20 Sony Corporation Haut-parleur à ultrasons pour mappage de chambre
CN107205202B (zh) * 2016-03-16 2020-03-20 索尼公司 用于产生音频的系统、方法和设备
CN107205202A (zh) * 2016-03-16 2017-09-26 索尼公司 带有超声波房间测绘的超声波扬声器组件
CN110431853A (zh) * 2017-03-29 2019-11-08 索尼公司 扬声器设备、音频数据提供设备以及音频数据再现系统
CN110431853B (zh) * 2017-03-29 2022-05-31 索尼公司 扬声器设备、音频数据提供设备以及音频数据再现系统
CN111800731A (zh) * 2019-04-03 2020-10-20 雅马哈株式会社 声音信号处理装置以及声音信号处理方法
US11089422B2 (en) * 2019-04-03 2021-08-10 Yamaha Corporation Sound signal processor and sound signal processing method
CN114846821A (zh) * 2019-12-18 2022-08-02 杜比实验室特许公司 音频设备自动定位

Similar Documents

Publication Publication Date Title
US11503424B2 (en) Audio processing apparatus and method therefor
EP2805326B1 (fr) Rendu et codage audio spatial
US10506358B2 (en) Binaural audio processing
US9973871B2 (en) Binaural audio processing with an early part, reverberation, and synchronization
KR101858479B1 (ko) 제 1 및 제 2 입력 채널들을 적어도 하나의 출력 채널에 매핑하기 위한 장치 및 방법
RU2752600C2 (ru) Способ и устройство для рендеринга акустического сигнала и машиночитаемый носитель записи
US9478228B2 (en) Encoding and decoding of audio signals
WO2014087277A1 (fr) Génération de signaux de commande pour transducteurs audio
KR20120006060A (ko) 오디오 신호 합성
WO2014091375A1 (fr) Traitement de réverbération dans un signal audio
US20150340043A1 (en) Multichannel encoder and decoder with efficient transmission of position information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13795586

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13795586

Country of ref document: EP

Kind code of ref document: A1