WO2013006338A2 - System and method for adaptive audio signal generation, coding and rendering - Google Patents

System and method for adaptive audio signal generation, coding and rendering Download PDF

Info

Publication number
WO2013006338A2
WO2013006338A2 PCT/US2012/044388 US2012044388W WO2013006338A2 WO 2013006338 A2 WO2013006338 A2 WO 2013006338A2 US 2012044388 W US2012044388 W US 2012044388W WO 2013006338 A2 WO2013006338 A2 WO 2013006338A2
Authority
WO
WIPO (PCT)
Prior art keywords
audio
metadata
playback
speakers
speaker
Prior art date
Application number
PCT/US2012/044388
Other languages
French (fr)
Other versions
WO2013006338A3 (en
Inventor
Charles Q. Robinson
Nicolas R. Tsingos
Christophe Chabanne
Original Assignee
Dolby Laboratories Licensing Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corporation filed Critical Dolby Laboratories Licensing Corporation
Priority to KR1020197003234A priority Critical patent/KR102003191B1/en
Priority to KR1020237041109A priority patent/KR20230170110A/en
Priority to BR122020001361-3A priority patent/BR122020001361B1/en
Priority to IL302167A priority patent/IL302167A/en
Priority to CN201280032058.3A priority patent/CN103650539B/en
Priority to IL295733A priority patent/IL295733B2/en
Priority to KR1020197020510A priority patent/KR102115723B1/en
Priority to US14/130,386 priority patent/US9179236B2/en
Priority to KR1020147037035A priority patent/KR101845226B1/en
Priority to MX2013014684A priority patent/MX2013014684A/en
Priority to KR1020207034194A priority patent/KR102406776B1/en
Priority to AU2012279357A priority patent/AU2012279357B2/en
Priority to IL291043A priority patent/IL291043B2/en
Priority to KR1020137034894A priority patent/KR101685447B1/en
Priority to JP2014518958A priority patent/JP5912179B2/en
Priority to UAA201400839A priority patent/UA114793C2/en
Priority to PL12743261T priority patent/PL2727383T3/en
Priority to ES12743261T priority patent/ES2871224T3/en
Priority to DK12743261.5T priority patent/DK2727383T3/en
Priority to BR112013033386-3A priority patent/BR112013033386B1/en
Priority to EP21169907.9A priority patent/EP3893521A1/en
Priority to CA2837893A priority patent/CA2837893C/en
Priority to EP12743261.5A priority patent/EP2727383B1/en
Priority to KR1020187008804A priority patent/KR101946795B1/en
Priority to RU2013158054A priority patent/RU2617553C2/en
Priority to KR1020207014372A priority patent/KR102185941B1/en
Priority to KR1020227018617A priority patent/KR102608968B1/en
Publication of WO2013006338A2 publication Critical patent/WO2013006338A2/en
Publication of WO2013006338A3 publication Critical patent/WO2013006338A3/en
Priority to IL230046A priority patent/IL230046A/en
Priority to US14/866,350 priority patent/US9467791B2/en
Priority to AU2016202227A priority patent/AU2016202227B2/en
Priority to IL245574A priority patent/IL245574A0/en
Priority to US15/263,279 priority patent/US9622009B2/en
Priority to US15/483,806 priority patent/US9800991B2/en
Priority to US15/672,656 priority patent/US9942688B2/en
Priority to US15/905,536 priority patent/US10057708B2/en
Priority to AU2018203734A priority patent/AU2018203734B2/en
Priority to US16/035,262 priority patent/US10165387B2/en
Priority to US16/207,006 priority patent/US10327092B2/en
Priority to IL265741A priority patent/IL265741B/en
Priority to AU2019204012A priority patent/AU2019204012B2/en
Priority to US16/443,268 priority patent/US10477339B2/en
Priority to US16/679,945 priority patent/US10904692B2/en
Priority to AU2020226984A priority patent/AU2020226984B2/en
Priority to IL277736A priority patent/IL277736B/en
Priority to US17/156,459 priority patent/US11412342B2/en
Priority to IL284585A priority patent/IL284585B/en
Priority to AU2021258043A priority patent/AU2021258043B2/en
Priority to US17/883,440 priority patent/US11962997B2/en
Priority to AU2023200502A priority patent/AU2023200502A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/308Electronic adaptation dependent on speaker or headphone connection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/02Spatial or constructional arrangements of loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/13Application of wave-field synthesis in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation

Definitions

  • One or more implementations relate generally to audio signal processing, and more specifically to hybrid object and channel-based audio processing for use in cinema, home, and other environments.
  • audio objects which are audio signals with associated parametric source descriptions of apparent source position (e.g., 3D coordinates), apparent source width, and other parameters.
  • Object-based audio is increasingly being used for many current multimedia applications, such as digital movies, video games, simulators, and 3D video.
  • These four description formats are often associated with the one or more rendering technologies that convert the audio signals to speaker feeds.
  • Current rendering technologies include panning, in which the audio stream is converted to speaker feeds using a set of panning laws and known or assumed speaker positions (typically rendered prior to distribution); Ambisonics, in which the microphone signals are converted to feeds for a scalable array of speakers (typically rendered after distribution); WFS (wave field synthesis) in which sound events are converted to the appropriate speaker signals to synthesize the sound field (typically rendered after distribution); and binaural, in which the L/R (left/right) binaural signals are delivered to the L/R ear, typically using headphones, but also by using speakers and crosstalk cancellation (rendered before or after distribution).
  • the speaker-feed format is the most common because it is simple and effective. The best sonic results (most accurate, most reliable) are achieved by mixing/monitoring and distributing to the speaker feeds directly since there is no processing between the content creator and listener. If the playback system is known in advance, a speaker feed description generally provides the highest fidelity.
  • the model-based description is considered the most adaptable because it makes no assumptions about the rendering technology and is therefore most easily applied to any rendering technology. Though the model-based description efficiently captures spatial information it becomes very inefficient as the number of audio sources increases.
  • the surround 'zones' comprise of an array of speakers, all of which carry the same audio information within each left surround or right surround zone. Such arrays may be effective with 'ambient' or diffuse surround effects, however, in everyday life many sound effects originate from randomly placed point sources.
  • ambient music may be played from apparently all around, while subtle but discrete sounds originate from specific points: a person chatting from one point, the clatter of a knife on a plate from another. Being able to place such sounds discretely around the auditorium can add a heightened sense of reality without being noticeably obvious.
  • Overhead sounds are also an important component of surround definition. In the real world, sounds originate from all directions, and not always from a single horizontal plane. An added sense of realism can be achieved if sound can be heard from overhead, in other words from the 'upper hemisphere.' Present systems, however, do not offer truly accurate reproduction of sound for different audio types in a variety of different playback environments. A great deal of processing, knowledge, and configuration of actual playback environments is required using existing systems to attempt accurate representation of location specific sounds, thus rendering current systems impractical for most applications.
  • timbral quality of some sounds can suffer from being reproduced by an array of speakers.
  • the ability to direct specific sounds to a single speaker gives the mixer the opportunity to eliminate the artifacts of array reproduction and deliver a more realistic experience to the audience.
  • surround speakers do not support the same full range of audio frequency and level that the large screen channels support. Historically, this has created issues for mixers, reducing their ability to freely move full-range sounds from screen to room. As a result, theatre owners have not felt compelled to upgrade their surround channel configuration, preventing the widespread adoption of higher quality installations.
  • Audio streams are transmitted along with metadata that describes the "mixer's intent" including desired position of the audio stream.
  • the position can be expressed as a named channel (from within the predefined channel configuration) or as three- dimensional position information.
  • This channels plus objects format combines optimum channel-based and model-based audio scene description methods.
  • Audio data for the adaptive audio system comprises a number of independent monophonic audio streams. Each stream has associated with it metadata that specifies whether the stream is a channel-based or object-based stream.
  • Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through mathematical expressions encoded in further associated metadata.
  • the original independent audio streams are packaged as a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent.
  • the object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content. This enables sound to be optimally mixed for a particular playback environment that may be different from the mix environment experienced by the sound engineer.
  • the adaptive audio system improves the audio quality in different rooms through such benefits as improved room equalization and surround bass management, so that the speakers (whether on-screen or off-screen) can be freely addressed by the mixer without having to think about timbral matching.
  • the adaptive audio system adds the flexibility and power of dynamic audio objects into traditional channel-based workflows. These audio objects allow creators to control discrete sound elements irrespective of any specific playback speaker configurations, including overhead speakers.
  • the system also introduces new efficiencies to the postproduction process, allowing sound engineers to efficiently capture all of their intent and then in real-time monitor, or automatically generate, surround-sound 7.1 and 5.1 versions.
  • the adaptive audio system simplifies distribution by encapsulating the audio essence and artistic intent in a single track file within a digital cinema processor, which can be faithfully played back in a broad range of theatre configurations.
  • the system provides optimal reproduction of artistic intent when mix and render use the same channel configuration and a single inventory with downward adaption to rendering configuration, i.e., downmixing.
  • FIG. 1 is a top-level overview of an audio creation and playback environment utilizing an adaptive audio system, under an embodiment.
  • FIG. 2 illustrates the combination of channel and object-based data to produce an adaptive audio mix, under an embodiment.
  • FIG. 3 is a block diagram illustrating the workflow of creating, packaging and rendering adaptive audio content, under an embodiment.
  • FIG. 4 is a block diagram of a rendering stage of an adaptive audio system, under an embodiment.
  • FIG. 5 is a table that lists the metadata types and associated metadata elements for the adaptive audio system, under an embodiment.
  • FIG. 6 is a diagram that illustrates a post-production and mastering for an adaptive audio system, under an embodiment.
  • FIG. 7 is a diagram of an example workflow for a digital cinema packaging process using adaptive audio files, under an embodiment.
  • FIG. 8 is an overhead view of an example layout of suggested speaker locations for use with an adaptive audio system in a typical auditorium.
  • FIG. 9 is a front view of an example placement of suggested speaker locations at the screen for use in the typical auditorium.
  • FIG. 10 is a side view of an example layout of suggested speaker locations for use with in adaptive audio system in the typical auditorium.
  • FIG. 11 is an example of a positioning of top surround speakers and side surround speakers relative to the reference point, under an embodiment.
  • Systems and methods are described for an adaptive audio system and associated audio signal and data format that supports multiple rendering technologies. Aspects of the one or more embodiments described herein may be implemented in an audio or audio-visual system that processes source audio information in a mixing, rendering and playback system that includes one or more computers or processing devices executing software instructions. Any of the described embodiments may be used alone or together with one another in any combination. Although various embodiments may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different deficiencies that may be discussed in the specification. Some embodiments may only partially address some deficiencies or just one deficiency that may be discussed in the specification, and some embodiments may not address any of these deficiencies.
  • Channel or audio channel a monophonic audio signal or an audio stream plus metadata in which the position is coded as a channel ID, e.g. Left Front or Right Top Surround.
  • a channel object may drive multiple speakers, e.g., the Left Surround channels (Ls) will feed all the speakers in the Ls array.
  • Ls Left Surround channels
  • Channel Configuration a pre-defined set of speaker zones with associated nominal locations, e.g. 5.1, 7.1, and so on;
  • 5.1 refers to a six-channel surround sound audio system having front left and right channels, center channel, two surround channels, and a subwoofer channel;
  • 7.1 refers to a eight-channel surround system that adds two additional surround channels to the 5.1 system. Examples of 5.1 and 7.1 configurations include Dolby® surround systems.
  • Speaker an audio transducer or set of transducers that render an audio signal.
  • Speaker Zone an array of one or more speakers can be uniquely referenced and that receive a single audio signal, e.g. Left Surround as typically found in cinema, and in particular for exclusion or inclusion for object rendering.
  • Speaker Channel or Speaker-feed Channel an audio channel that is associated with a named speaker or speaker zone within a defined speaker configuration. A speaker channel is nominally rendered using the associated speaker zone.
  • Speaker Channel Group a set of one or more speaker channels corresponding to a channel configuration (e.g. a stereo track, mono track, etc.)
  • Object or Object Channel one or more audio channels with a parametric source description, such as apparent source position (e.g. 3D coordinates), apparent source width, etc.
  • Audio Program the complete set of speaker channels and/or object channels and associated metadata that describes the desired spatial audio presentation.
  • Allocentric reference a spatial reference in which audio objects are defined relative to features within the rendering environment such as room walls and corners, standard speaker locations, and screen location (e.g., front left corner of a room).
  • Egocentric reference a spatial reference in which audio objects are defined relative to the perspective of the (audience) listener and often specified with respect to angles relative to a listener (e.g., 30 degrees right of the listener).
  • Frame frames are short, independently decodable segments into which a total audio program is divided. The audio frame rate and boundary is typically aligned with the video frames.
  • Adaptive audio channel-based and/or object-based audio signals plus metadata that renders the audio signals based on the playback environment.
  • the cinema sound format and processing system described herein also referred to as an "adaptive audio system," utilizes a new spatial audio description and rendering technology to allow enhanced audience immersion, more artistic control, system flexibility and scalability, and ease of installation and maintenance.
  • Embodiments of a cinema audio platform include several discrete components including mixing tools, packer/encoder, unpack/decoder, in-theater final mix and rendering components, new speaker designs, and networked amplifiers.
  • the system includes recommendations for a new channel
  • the system utilizes a model- based description that supports several features such as: single inventory with downward and upward adaption to rendering configuration, i.e., delay rendering and enabling optimal use of available speakers; improved sound envelopment, including optimized downmixing to avoid inter-channel correlation; increased spatial resolution through steer-thru arrays (e.g., an audio object dynamically assigned to one or more speakers within a surround array); and support for alternate rendering methods.
  • FIG. 1 is a top-level overview of an audio creation and playback environment utilizing an adaptive audio system, under an embodiment.
  • a comprehensive, end-to-end environment 100 includes content creation, packaging, distribution and playback/rendering components across a wide number of end-point devices and use cases.
  • the overall system 100 originates with content captured from and for a number of different use cases that comprise different user experiences 112.
  • the content capture element 102 includes, for example, cinema, TV, live broadcast, user generated content, recorded content, games, music, and the like, and may include audio/visual or pure audio content.
  • the content as it progresses through the system 100 from the capture stage 102 to the final user experience 112, traverses several key processing steps through discrete system components.
  • process steps include pre-processing of the audio 104, authoring tools and processes 106, encoding by an audio codec 108 that captures, for example, audio data, additional metadata and reproduction information, and object channels.
  • Various processing effects such as compression (lossy or lossless), encryption, and the like may be applied to the object channels for efficient and secure distribution through various mediums.
  • Appropriate endpoint- specific decoding and rendering processes 110 are then applied to reproduce and convey a particular adaptive audio user experience 112.
  • the audio experience 112 represents the playback of the audio or audio/visual content through appropriate speakers and playback devices, and may represent any environment in which a listener is experiencing playback of the captured content, such as a cinema, concert hall, outdoor theater, a home or room, listening booth, car, game console, headphone or headset system, public address (PA) system, or any other playback environment.
  • PA public address
  • the embodiment of system 100 includes an audio codec 108 that is capable of efficient distribution and storage of multichannel audio programs, and hence may be referred to as a 'hybrid' codec.
  • the codec 108 combines traditional channel-based audio data with associated metadata to produce audio objects that facilitate the creation and delivery of audio that is adapted and optimized for rendering and playback in environments that maybe different from the mixing environment. This allows the sound engineer to encode his or her intent with respect to how the final audio should be heard by the listener, based on the actual listening environment of the listener.
  • a new form of audio coding called audio object coding provides distinct sound sources (audio objects) as input to the encoder in the form of separate audio streams.
  • audio objects include dialog tracks, single instruments, individual sound effects, and other point sources.
  • Each audio object is associated with spatial parameters, which may include, but are not limited to, sound position, sound width, and velocity information.
  • the audio objects and associated parameters are then coded for distribution and storage.
  • Final audio object mixing and rendering is performed at the receive end of the audio distribution chain, as part of audio program playback. This step may be based on knowledge of the actual speaker positions so that the result is an audio distribution system that is customizable to user-specific listening conditions.
  • the two coding forms, channel-based and object-based perform optimally for different input signal conditions.
  • Channel-based audio coders are generally more efficient for coding input signals containing dense mixtures of different audio sources and for diffuse sounds.
  • audio object coders are more efficient for coding a small number of highly directional sound sources.
  • the methods and components of system 100 comprise an audio encoding, distribution, and decoding system configured to generate one or more bitstreams containing both conventional channel-based audio elements and audio object coding elements.
  • an audio encoding, distribution, and decoding system configured to generate one or more bitstreams containing both conventional channel-based audio elements and audio object coding elements.
  • Other aspects of the described embodiments include extending a predefined channel-based audio codec in a backwards-compatible manner to include audio object coding elements.
  • a new 'extension layer' containing the audio object coding elements is defined and added to the 'base' or 'backwards compatible' layer of the channel-based audio codec bitstream.
  • This approach enables one or more bitstreams, which include the extension layer to be processed by legacy decoders, while providing an enhanced listener experience for users with new decoders.
  • One example of an enhanced user experience includes control of audio object rendering.
  • An additional advantage of this approach is that audio objects may be added or modified anywhere along the distribution chain without decoding/mixing/re- encoding multichannel audio encoded with the channel-based audio codec.
  • the spatial effects of audio signals are critical in providing an immersive experience for the listener. Sounds that are meant to emanate from a specific region of a viewing screen or room should be played through speaker(s) located at that same relative location.
  • the primary audio metadatum of a sound event in a model-based description is position, though other parameters such as size, orientation, velocity and acoustic dispersion can also be described.
  • a model-based, 3D, audio spatial description requires a 3D coordinate system.
  • the coordinate system used for transmission (Euclidean, spherical, etc) is generally chosen for convenience or compactness, however, other coordinate systems may be used for the rendering processing.
  • a frame of reference is required for representing the locations of objects in space.
  • an audio source position is defined relative to features within the rendering environment such as room walls and corners, standard speaker locations, and screen location.
  • locations are represented with respect to the perspective of the listener, such as "in front of me, slightly to the left," and so on.
  • Scientific studies of spatial perception have shown that the egocentric perspective is used almost universally. For cinema however, allocentric is generally more appropriate for several reasons. For example, the precise location of an audio object is most important when there is an associated object on screen.
  • an egocentric frame of reference may be useful, and more appropriate.
  • these include non-diegetic sounds, i.e., those that are not present in the "story space," e.g. mood music, for which an egocentrically uniform presentation may be desirable.
  • Another case is near- field effects (e.g., a buzzing mosquito in the listener's left ear) that require an egocentric representation.
  • near- field effects e.g., a buzzing mosquito in the listener's left ear
  • infinitely far sound sources (and the resulting plane waves) appear to come from a constant egocentric position (e.g., 30 degrees to the left), and such sounds are easier to describe in egocentric terms than in allocentric terms.
  • Embodiments of the adaptive audio system include a hybrid spatial description approach that includes a recommended channel configuration for optimal fidelity and for rendering of diffuse or complex, multi-point sources (e.g., stadium crowd, ambience) using an egocentric reference, plus an allocentric, model-based sound description to efficiently enable increased spatial resolution and scalability.
  • the original sound content data 102 is first processed in a pre-processing block 104.
  • the pre-processing block 104 of system 100 includes an object channel filtering component.
  • audio objects contain individual sound sources to enable independent panning of sounds.
  • Embodiments include a method for isolating independent source signals from a more complex signal. Undesirable elements to be separated from independent source signals may include, but are not limited to, other independent sound sources and background noise. In addition, reverb may be removed to recover "dry" sound sources.
  • the pre-processor 104 also includes source separation and content type detection functionality.
  • the system provides for automated generation of metadata through analysis of input audio.
  • Positional metadata is derived from a multi-channel recording through an analysis of the relative levels of correlated input between channel pairs. Detection of content type, such as "speech" or "music”, may be achieved, for example, by feature extraction and classification.
  • the authoring tools block 106 includes features to improve the authoring of audio programs by optimizing the input and codification of the sound engineer's creative intent allowing him to create the final audio mix once that is optimized for playback in practically any playback environment. This is accomplished through the use of audio objects and positional data that is associated and encoded with the original audio content. In order to accurately place sounds around an auditorium the sound engineer needs control over how the sound will ultimately be rendered based on the actual constraints and features of the playback environment. The adaptive audio system provides this control by allowing the sound engineer to change how the audio content is designed and mixed through the use of audio objects and positional data.
  • Audio objects can be considered as groups of sound elements that may be perceived to emanate from a particular physical location or locations in the auditorium. Such objects can be static, or they can move.
  • the audio objects are controlled by metadata, which among other things, details the position of the sound at a given point in time.
  • metadata which among other things, details the position of the sound at a given point in time.
  • objects are monitored or played back in a theatre, they are rendered according to the positional metadata using the speakers that are present, rather than necessarily being output to a physical channel.
  • a track in a session can be an audio object, and standard panning data is analogous to positional metadata. In this way, content placed on the screen might pan in effectively the same way as with channel-based content, but content placed in the surrounds can be rendered to an individual speaker if desired.
  • the adaptive audio system supports 'beds' in addition to audio objects, where beds are effectively channel-based sub-mixes or stems. These can be delivered for final playback (rendering) either individually, or combined into a single bed, depending on the intent of the content creator. These beds can be created in different channel-based configurations such as 5.1, 7.1, and are extensible to more extensive formats such as 9.1 , and arrays that include overhead speakers.
  • FIG. 2 illustrates the combination of channel and object-based data to produce an adaptive audio mix, under an embodiment.
  • the channel-based data 202 which, for example, may be 5.1 or 7.1 surround sound data provided in the form of pulse-code modulated (PCM) data is combined with audio object data 204 to produce an adaptive audio mix 208.
  • the audio object data 204 is produced by combining the elements of the original channel-based data with associated metadata that specifies certain parameters pertaining to the location of the audio objects.
  • the authoring tools provide the ability to create audio programs that contain a combination of speaker channel groups and object channels simultaneously.
  • an audio program could contain one or more speaker channels optionally organized into groups (or tracks, e.g. a stereo or 5.1 track), descriptive metadata for one or more speaker channels, one or more object channels, and descriptive metadata for one or more object channels.
  • each speaker channel group, and each object channel may be represented using one or more different sample rates.
  • Digital Cinema (D-Cinema) applications support 48 kHz and 96 kHz sample rates, but other sample rates may also be supported.
  • ingest, storage and editing of channels with different sample rates may also be supported.
  • the creation of an audio program requires the step of sound design, which includes combining sound elements as a sum of level adjusted constituent sound elements to create a new, desired sound effect.
  • the authoring tools of the adaptive audio system enable the creation of sound effects as a collection of sound objects with relative positions using a spatio-visual sound design graphical user interface.
  • a visual representation of the sound generating object e.g., a car
  • audio elements exhaust note, tire hum, engine noise
  • the individual object channels can then be linked and manipulated as a group.
  • the authoring tool 106 includes several user interface elements to allow the sound engineer to input control information and view mix parameters, and improve the system functionality.
  • the sound design and authoring process is also improved by allowing object channels and speaker channels to be linked and manipulated as a group.
  • One example is combining an object channel with a discrete, dry sound source with a set of speaker channels that contain an associated reverb signal.
  • the audio authoring tool 106 supports the ability to combine multiple audio channels, commonly referred to as mixing. Multiple methods of mixing are supported, and may include traditional level-based mixing and loudness based mixing.
  • level-based mixing wideband scaling is applied to the audio channels, and the scaled audio channels are then summed together. The wideband scale factors for each channel are chosen to control the absolute level of the resulting mixed signal, and also the relative levels of the mixed channels within the mixed signal.
  • loudness-based mixing one or more input signals are modified using frequency dependent amplitude scaling, where the frequency dependent amplitude is chosen to provide the desired perceived absolute and relative loudness, while preserving the perceived timbre of the input sound.
  • the authoring tools allow for the ability to create speaker channels and speaker channel groups. This allows metadata to be associated with each speaker channel group. Each speaker channel group can be tagged according to content type. The content type is extensible via a text description. Content types may include, but are not limited to, dialog, music, and effects. Each speaker channel group may be assigned unique instructions on how to upmix from one channel configuration to another, where upmixing is defined as the creation of M audio channels from N channels where M > N.
  • Upmix instructions may include, but are not limited to, the following: an enable/disable flag to indicate if upmixing is permitted; an upmix matrix to control the mapping between each input and output channel; and default enable and matrix settings may be assigned based on content type, e.g., enable upmixing for music only.
  • Each speaker channel group may be also be assigned unique instructions on how to downmix from one channel configuration to another, where downmixing is defined as the creation of Y audio channels from X channels where Y ⁇ X.
  • Downmix instructions may include, but are not limited to, the following: a matrix to control the mapping between each input and output channel; and default matrix settings can be assigned based on content type, e.g., dialog shall downmix onto screen; effects shall downmix off the screen.
  • Each speaker channel can also be associated with a metadata flag to disable bass management during rendering.
  • Embodiments include a feature that enables the creation of object channels and object channel groups.
  • This invention allows metadata to be associated with each object channel group.
  • Each object channel group can be tagged according to content type.
  • the content type is extensible via a text description, wherein the content types may include, but are not limited to, dialog, music, and effects.
  • Each object channel group can be assigned metadata to describe how the object(s) should be rendered.
  • Position information is provided to indicate the desired apparent source position.
  • Position may be indicated using an egocentric or allocentric frame of reference.
  • the egocentric reference is appropriate when the source position is to be referenced to the listener.
  • spherical coordinates are useful for position description.
  • An allocentric reference is the typical frame of reference for cinema or other audio/visual presentations where the source position is referenced relative to objects in the presentation environment such as a visual display screen or room boundaries.
  • Three-dimensional (3D) trajectory information is provided to enable the interpolation of position or for use of other rendering decisions such as enabling a "snap to mode.”
  • Size information is provided to indicate the desired apparent perceived audio source size.
  • Spatial quantization is provided through a "snap to closest speaker" control that indicates an intent by the sound engineer or mixer to have an object rendered by exactly one speaker (with some potential sacrifice to spatial accuracy).
  • a limit to the allowed spatial distortion can be indicated through elevation and azimuth tolerance thresholds such that if the threshold is exceeded, the "snap" function will not occur.
  • a crossfade rate parameter can be indicated to control how quickly a moving object will transition or jump from one speaker to another when the desired position crosses between to speakers.
  • dependent spatial metadata is used for certain position metadata.
  • metadata can be automatically generated for a "slave" object by associating it with a "master" object that the slave object is to follow.
  • a time lag or relative speed can be assigned to the slave object.
  • Mechanisms may also be provided to allow for the definition of an acoustic center of gravity for sets or groups of objects, so that an object may be rendered such that it is perceived to move around another object. In such a case, one or more objects may rotate around an object or a defined area, such as a dominant point, or a dry area of the room. The acoustic center of gravity would then be used in the rendering stage to help determine location information for each appropriate object-based sound, even though the ultimate location information would be expressed as a location relative to the room, as opposed to a location relative to another object.
  • the speaker sets to be restricted may include, but are not limited to, any of the named speakers or speaker zones (e.g. L, C, R, etc.), or speaker areas, such as: front wall, back wall, left wall, right wall, ceiling, floor, speakers within the room, and so on.
  • the audio program description can be adapted for rendering on a wide variety of speaker installations and channel configurations.
  • an audio program is authored, it is important to monitor the effect of rendering the program on anticipated playback configurations to verify that the desired results are achieved.
  • This invention includes the ability to select target playback configurations and monitor the result.
  • the system can automatically monitor the worst case (i.e. highest) signal levels that would be generated in each anticipated playback configuration, and provide an indication if clipping or limiting will occur.
  • FIG. 3 is a block diagram illustrating the workflow of creating, packaging and rendering adaptive audio content, under an embodiment.
  • the workflow 300 of FIG. 3 is divided into three distinct task groups labeled creation/authoring, packaging, and exhibition.
  • the hybrid model of beds and objects shown in FIG. 2 allows most sound design, editing, pre-mixing, and final mixing to be performed in the same manner as they are today and without adding excessive overhead to present processes.
  • the adaptive audio functionality is provided in the form of software, firmware or circuitry that is used in conjunction with sound production and processing equipment, wherein such equipment may be new hardware systems or updates to existing systems.
  • plug-in applications may be provided for digital audio workstations to allow existing panning techniques within sound design and editing to remain unchanged. In this way, it is possible to lay down both beds and objects within the workstation in 5.1 or similar surround-equipped editing rooms.
  • Object audio and metadata is recorded in the session in preparation for the pre- and final-mix stages in the dubbing theatre.
  • the creation or authoring tasks involve inputting mixing controls 302 by a user, e.g., a sound engineer in the following example, to a mixing console or audio workstation 304.
  • metadata is integrated into the mixing console surface, allowing the channel strips' faders, panning and audio processing to work with both beds or stems and audio objects.
  • the metadata can be edited using either the console surface or the workstation user interface, and the sound is monitored using a rendering and mastering unit (RMU) 306.
  • RMU rendering and mastering unit
  • the bed and object audio data and associated metadata is recorded during the mastering session to create a 'print master,' which includes an adaptive audio mix 310 and any other rendered deliverables (such as a surround 7.1 or 5.1 theatrical mix) 308.
  • Existing authoring tools may be used to allow sound engineers to label individual audio tracks within a mix session. Embodiments extend this concept by allowing users to label individual sub-segments within a track to aid in finding or quickly identifying audio elements.
  • the user interface to the mixing console that enables definition and creation of the metadata may be implemented through graphical user interface elements, physical controls (e.g., sliders and knobs), or any combination thereof.
  • the print master file is wrapped using industry-standard MXF wrapping procedures, hashed and optionally encrypted in order to ensure integrity of the audio content for delivery to the digital cinema packaging facility.
  • This step may be performed by a digital cinema processor (DCP) 312 or any appropriate audio processor depending on the ultimate playback environment, such as a standard surround-sound equipped theatre 318, an adaptive audio-enabled theatre 320, or any other playback environment.
  • the processor 312 outputs the appropriate audio signals 314 and 316 depending on the exhibition environment.
  • the adaptive audio print master contains an adaptive audio mix, along with a standard DCI-compliant Pulse Code Modulated (PCM) mix.
  • PCM Pulse Code Modulated
  • the PCM mix can be rendered by the rendering and mastering unit in a dubbing theatre, or created by a separate mix pass if desired.
  • PCM audio forms the standard main audio track file within the digital cinema processor 312, and the adaptive audio forms an additional track file.
  • Such a track file may be compliant with existing industry standards, and is ignored by DCI-compliant servers that cannot use it.
  • the DCP containing an adaptive audio track file is recognized by a server as a valid package, and ingested into the server and then streamed to an adaptive audio cinema processor.
  • the adaptive audio packaging scheme allows the delivery of a single type of package to be delivered to a cinema.
  • the DCP package contains both PCM and adaptive audio files.
  • security keys such as a key delivery message (KDM) may be incorporated to enable secure delivery of movie content, or other similar content.
  • the adaptive audio methodology is realized by enabling a sound engineer to express his or her intent with regard to the rendering and playback of audio content through the audio workstation 304.
  • the engineer is able to specify where and how audio objects and sound elements are played back depending on the listening environment.
  • Metadata is generated in the audio workstation 304 in response to the engineer's mixing inputs 302 to provide rendering queues that control spatial parameters (e.g., position, velocity, intensity, timbre, etc.) and specify which speaker(s) or speaker groups in the listening environment play respective sounds during exhibition.
  • the metadata is associated with the respective audio data in the workstation 304 or RMU 306 for packaging and transport by DCP 312.
  • a graphical user interface and software tools that provide control of the workstation 304 by the engineer comprise at least part of the authoring tools 106 of FIG. 1.
  • system 100 includes a hybrid audio codec 108.
  • This component comprises an audio encoding, distribution, and decoding system that is configured to generate a single bitstream containing both conventional channel-based audio elements and audio object coding elements.
  • the hybrid audio coding system is built around a channel- based encoding system that is configured to generate a single (unified) bitstream that is simultaneously compatible with (i.e., decodable by) a first decoder configured to decode audio data encoded in accordance with a first encoding protocol (channel-based) and one or more secondary decoders configured to decode audio data encoded in accordance with one or more secondary encoding protocols (object-based).
  • the bitstream can include both encoded data (in the form of data bursts) decodable by the first decoder (and ignored by any secondary decoders) and encoded data (e.g., other bursts of data) decodable by one or more secondary decoders (and ignored by the first decoder).
  • the decoded audio and associated information (metadata) from the first and one or more of the secondary decoders can then be combined in a manner such that both the channel-based and object-based information is rendered simultaneously to recreate a facsimile of the environment, channels, spatial information, and objects presented to the hybrid coding system (i.e. within a 3D space or listening
  • the codec 108 generates a bitstream containing coded audio information and information relating to multiple sets of channel positions (speakers).
  • one set of channel positions is fixed and used for the channel based encoding protocol
  • another set of channel positions is adaptive and used for the audio object based encoding protocol, such that the channel configuration for an audio object may change as a function of time (depending on where the object is placed in the sound field).
  • the hybrid audio coding system may carry information about two sets of speaker locations for playback, where one set may be fixed and be a subset of the other.
  • Devices supporting legacy coded audio information would decode and render the audio information from the fixed subset, while a device capable of supporting the larger set could decode and render the additional coded audio information that would be time-varyingly assigned to different speakers from the larger set.
  • the system is not dependent on the first and one or more of the secondary decoders being simultaneously present within a system and/or device.
  • a legacy and/or existing device/system containing only a decoder supporting the first protocol would yield a fully compatible sound field to be rendered via traditional channel-based reproduction systems.
  • the unknown or unsupported portion(s) of the hybrid-bitstream protocol i.e., the audio information represented by a secondary encoding protocol
  • the codec 108 is configured to operate in a mode where the first encoding subsystem (supporting the first protocol) contains a combined
  • the hybrid bitstream includes backward compatibility with decoders supporting only the first encoder subsystem's protocol by allowing audio objects (typically carried in one or more secondary encoder protocols) to be represented and rendered within decoders supporting only the first protocol.
  • the codec 108 includes two or more encoding subsystems, where each of these subsystems is configured to encode audio data in accordance with a different protocol, and is configured to combine the outputs of the subsystems to generate a hybrid-format (unified) bitstream.
  • One of the benefits the embodiments is the ability for a hybrid coded audio bitstream to be carried over a wide -range of content distribution systems, where each of the distribution systems conventionally supports only data encoded in accordance with the first encoding protocol. This eliminates the need for any system and/or transport level protocol modifications/changes in order to specifically support the hybrid coding system.
  • Audio encoding systems typically utilize standardized bitstream elements to enable the transport of additional (arbitrary) data within the bitstream itself.
  • This additional (arbitrary) data is typically skipped (i.e., ignored) during decoding of the encoded audio included in the bitstream, but may be used for a purpose other than decoding.
  • Different audio coding standards express these additional data fields using unique nomenclature.
  • Bitstream elements of this general type may include, but are not limited to, auxiliary data, skip fields, data stream elements, fill elements, ancillary data, and substream elements.
  • usage of the expression "auxiliary data" in this document does not imply a specific type or format of additional data, but rather should be interpreted as a generic expression that encompasses any or all of the examples associated with the present invention.
  • a data channel enabled via "auxiliary" bitstream elements of a first encoding protocol within a combined hybrid coding system bitstream could carry one or more secondary (independent or dependent) audio bitstreams (encoded in accordance with one or more secondary encoding protocols).
  • the one or more secondary audio bitstreams could be split into N-sample blocks and multiplexed into the "auxiliary data" fields of a first bitstream.
  • the first bitstream is decodable by an appropriate (complement) decoder.
  • the auxiliary data of the first bitstream could be extracted, recombined into one or more secondary audio bitstreams, decoded by a processor supporting the syntax of one or more of the secondary bitstreams, and then combined and rendered together or independently.
  • Bitstream elements associated with a secondary encoding protocol also carry and convey information (metadata) characteristics of the underlying audio, which may include, but are not limited to, desired sound source position, velocity, and size.
  • This metadata is utilized during the decoding and rendering processes to re-create the proper (i.e., original) position for the associated audio object carried within the applicable bitstream. It is also possible to carry the metadata described above, which is applicable to the audio objects contained in the one or more secondary bitstreams present in the hybrid stream, within bitstream elements associated with the first encoding protocol.
  • Bitstream elements associated with either or both the first and second encoding protocols of the hybrid coding system carry/convey contextual metadata that identify spatial parameters (i.e., the essence of the signal properties itself) and further information describing the underlying audio essence type in the form of specific audio classes that are carried within the hybrid coded audio bitstream.
  • Such metadata could indicate, for example, the presence of spoken dialogue, music, dialogue over music, applause, singing voice, etc., and could be utilized to adaptively modify the behavior of interconnected pre or post processing modules upstream or downstream of the hybrid coding system.
  • the codec 108 is configured to operate with a shared or common bit pool in which bits available for coding are "shared" between all or part of the encoding subsystems supporting one or more protocols.
  • a codec may distribute the available bits (from the common "shared" bit pool) between the encoding subsystems in order to optimize the overall audio quality of the unified bitstream. For example, during a first time interval, the codec may assign more of the available bits to a first encoding subsystem, and fewer of the available bits to the remaining subsystems, while during a second time interval, the codec may assign fewer of the available bits to the first encoding subsystem, and more of the available bits to the remaining subsystems.
  • the decision of how to assign bits between encoding subsystems may be dependent, for example, on results of statistical analysis of the shared bit pool, and/or analysis of the audio content encoded by each subsystem.
  • the codec may allocate bits from the shared pool in such a way that a unified bitstream constructed by multiplexing the outputs of the encoding subsystems maintains a constant frame length/bitrate over a specific time interval. It is also possible, in some cases, for the frame length/bitrate of the unified bitstream to vary over a specific time interval.
  • the codec 108 generates a unified bitstream including data encoded in accordance with the first encoding protocol configured and transmitted as an independent substream of an encoded data stream (which a decoder supporting the first encoding protocol will decode), and data encoded in accordance with a second protocol sent as an independent or dependent substream of the encoded data stream (one which a decoder supporting the first protocol will ignore). More generally, in a class of embodiments the codec generates a unified bitstream including two or more independent or dependent substreams (where each substream includes data encoded in accordance with a different or identical encoding protocol).
  • the codec 108 generates a unified bitstream including data encoded in accordance with the first encoding protocol configured and transmitted with a unique bitstream identifier (which a decoder supporting a first encoding protocol associated with the unique bitstream identifier will decode), and data encoded in accordance with a second protocol configured and transmitted with a unique bitstream identifier, which a decoder supporting the first protocol will ignore. More generally, in a class of embodiments the codec generates a unified bitstream including two or more substreams (where each substream includes data encoded in accordance with a different or identical encoding protocol and where each carries a unique bitstream identifier).
  • the methods and systems for creating a unified bitstream described above provide the ability to unambiguously signal (to a decoder) which interleaving and/or protocol has been utilized within a hybrid bitstream (e.g., to signal whether the AUX data, SKIP, DSE or the substream approach described in the is utilized).
  • the hybrid coding system is configured to support de-interleaving/demultiplexing and re-interleaving/re-multiplexing of bitstreams supporting one or more secondary protocols into a first bitstream (supporting a first protocol) at any processing point found throughout a media delivery system.
  • the hybrid codec is also configured to be capable of encoding audio input streams with different sample rates into one bitstream. This provides a means for efficiently coding and distributing audio sources containing signals with inherently different bandwidths. For example, dialog tracks typically have inherently lower bandwidth than music and effects tracks.
  • the adaptive audio system allows multiple (e.g., up to 128) tracks to be packaged, usually as a combination of beds and objects.
  • the basic format of the audio data for the adaptive audio system comprises a number of independent monophonic audio streams. Each stream has associated with it metadata that specifies whether the stream is a channel-based stream or an object-based stream.
  • the channel -based streams have rendering information encoded by means of channel name or label; and the object-based streams have location information encoded through mathematical expressions encoded in further associated metadata.
  • the original independent audio streams are then packaged as a single serial bitstream that contains all of the audio data in an ordered fashion.
  • This adaptive data configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the ultimate rendering location of a sound is based on the playback environment to correspond to the mixer's intent.
  • a sound can be specified to originate from a frame of reference of the playback room (e.g., middle of left wall), rather than a specific labeled speaker or speaker group (e.g., left surround).
  • the object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.
  • FIG. 4 is a block diagram of a rendering stage of an adaptive audio system, under an embodiment.
  • a number of input signals such as up to 128 audio tracks that comprise the adaptive audio signals 402 are provided by certain components of the creation, authoring and packaging stages of system 300, such as RMU 306 and processor 312. These signals comprise the channel-based beds and objects that are utilized by the Tenderer 404.
  • the channel-based audio (beds) and objects are input to a level manager 406 that provides control over the output levels or amplitudes of the different audio components.
  • Certain audio components may be processed by an array correction component 408.
  • the adaptive audio signals are then passed through a B -chain processing component 410, which generates a number (e.g., up to 64) of speaker feed output signals.
  • the B-chain feeds refer to the signals processed by power amplifiers, crossovers and speakers, as opposed to A-chain content that constitutes the sound track on the film stock.
  • the Tenderer 404 runs a rendering algorithm that intelligently uses the surround speakers in the theatre to the best of their ability.
  • a rendering algorithm that intelligently uses the surround speakers in the theatre to the best of their ability.
  • objects being panned between screen and surround speakers can maintain their sound pressure level and have a closer timbre match without, importantly, increasing the overall sound pressure level in the theatre.
  • An array of appropriately-specified surround speakers will typically have sufficient headroom to reproduce the maximum dynamic range available within a surround 7.1 or 5.1 soundtrack (i.e. 20 dB above reference level), however it is unlikely that a single surround speaker will have the same headroom of a large multi-way screen speaker.
  • the adaptive audio system improves the quality and power handling of surround speakers to provide an improvement in the faithfulness of the rendering. It provides support for bass management of the surround speakers through the use of optional rear subwoofers that allows each surround speaker to achieve improved power handling, and simultaneously potentially utilizing smaller speaker cabinets. It also allows the addition of side surround speakers closer to the screen than current practice to ensure that objects can smoothly transition from screen to surround.
  • system 400 provides a comprehensive, flexible method for content creators to move beyond the constraints of existing systems.
  • current systems create and distribute audio that is fixed to particular speaker locations with limited knowledge of the type of content conveyed in the audio essence (the part of the audio that is played back).
  • the adaptive audio system 100 provides a new hybrid approach that includes the option for both speaker location specific audio (left channel, right channel, etc.) and object oriented audio elements that have generalized spatial information which may include, but are not limited to position, size and velocity.
  • This hybrid approach provides a balanced approach for fidelity (provided by fixed speaker locations) and flexibility in rendering (generalized audio objects).
  • the system also provides additional useful information about the audio content that is paired with the audio essence by the content creator at the time of content creation.
  • This information provides powerful, detailed information on the attributes of the audio that can be used in very powerful ways during rendering.
  • attributes may include, but are not limited to, content type (dialog, music, effect, Foley, back ground / ambience, etc.), spatial attributes (3D position, 3D size, velocity), and rendering information (snap to speaker location, channel weights, gain, bass management information, etc.).
  • the adaptive audio system described herein provides powerful information that can be used for rendering by a widely varying number of end points.
  • the optimal rendering technique applied depends greatly on the end point device.
  • home theater systems and soundbars may have 2, 3, 5, 7 or even 9 separate speakers.
  • Many other types of systems, such as televisions, computers, and music docks have only two speakers, and nearly all commonly used devices have a binaural headphone output (PC, laptop, tablet, cell phone, music player, etc.).
  • PC personal computer
  • laptop tablet
  • cell phone music player, etc.
  • the end point devices often need to make simplistic decisions and compromises to render and reproduce audio that is now distributed in a channel/speaker specific form.
  • the adaptive audio system 100 provides this information and, potentially, access to audio objects, which can be used to create a compelling next generation user experience.
  • the system 100 allows the content creator to embed the spatial intent of the mix within the bitstream using metadata such as position, size, velocity, and so on, through a unique and powerful metadata and adaptive audio transmission format. This allows a great deal of flexibility in the spatial reproduction of audio. From a spatial rendering standpoint, adaptive audio enables the adaptation of the mix to the exact position of the speakers in a particular room in order to avoid spatial distortion that occurs when the geometry of the playback system is not identical to the authoring system. In current audio reproduction systems where only audio for a speaker channel is sent, the intent of the content creator is unknown.
  • System 100 uses metadata conveyed throughout the creation and distribution pipeline. An adaptive audio-aware reproduction system can use this metadata information to reproduce the content in a manner that matches the original intent of the content creator.
  • the mix can be adapted to the exact hardware configuration of the reproduction system.
  • rendering equipment such as television, home theaters, soundbars, portable music player docks, etc.
  • these systems When these systems are sent channel specific audio information today (i.e. left and right channel audio or multichannel audio) the system must process the audio to appropriately match the capabilities of the rendering equipment.
  • An example is standard stereo audio being sent to a soundbar with more than two speakers.
  • the intent of the content creator is unknown.
  • an adaptive audio aware reproduction system can use this information to reproduce the content in a manner that matches the original intent of the content creator. For example, some soundbars have side firing speakers to create a sense of envelopment. With adaptive audio, spatial information and content type (such as ambient effects) can be used by the soundbar to send only the appropriate audio to these side firing speakers.
  • the adaptive audio system allows for unlimited interpolation of speakers in a system on all front/back, left/right, up/down, near/far dimensions.
  • no information exists for how to handle audio where it may be desired to position the audio such that it is perceived by a listener to be between two speakers.
  • a spatial quantization factor is introduced.
  • the spatial positioning of the audio can be known accurately and reproduced accordingly on the audio reproduction system.
  • HRTF Head Related Transfer Functions
  • the spatial information conveyed by the adaptive audio system can be not only used by a content creator to create a compelling entertainment experience (film, television, music, etc.), but the spatial information can also indicate where a listener is positioned relative to physical objects such as buildings or geographic points of interest. This would allow the user to interact with a virtualized audio experience that is related to the real-world, i.e., augmented reality.
  • Embodiments also enable spatial upmixing, by performing enhanced upmixing by reading the metadata only if the objects audio data are not available. Knowing the position of all objects and their types allows the upmixer to better differentiate elements within the channel-based tracks.
  • Existing upmixing algorithms have to infer information such as the audio content type (speech, music, ambient effects) as well as the position of different elements within the audio stream to create a high quality upmix with minimal or no audible artifacts. Many times the inferred information may be incorrect or inappropriate.
  • the additional information available from the metadata related to, for example, audio content type, spatial position, velocity, audio object size, etc. can be used by an upmixing algorithm to create a high quality reproduction result.
  • the system also spatially matches the audio to the video by accurately positioning the audio object of the screen to visual elements.
  • a compelling audio/video reproduction experience is possible, particularly with larger screen sizes, if the reproduced spatial location of some audio elements match image elements on the screen.
  • An example is having the dialog in a film or television program spatially coincide with a person or character that is speaking on the screen. With normal speaker channel based audio there is no easy method to determine where the dialog should be spatially positioned to match the location of the person or character on-screen. With the audio information available with adaptive audio, such audio/visual alignment can be achieved.
  • the visual positional and audio spatial alignment can also be used for non- character/dialog objects such as cars, trucks, animation, and so on.
  • a spatial masking processing is facilitated by system 100, since knowledge of the spatial intent of a mix through the adaptive audio metadata means that the mix can be adapted to any speaker configuration.
  • knowledge of the spatial intent of a mix through the adaptive audio metadata means that the mix can be adapted to any speaker configuration.
  • spatial masking may be anticipated by the renderer, and the spatial and or loudness downmix parameters of each object may be adjusted so all audio elements of the mix remain just as perceptible as in the original mix.
  • the renderer understands the spatial relationship between the mix and the playback system, it has the ability to "snap" objects to the closest speakers instead of creating a phantom image between two or more speakers. While this may slightly distort the spatial representation of the mix, it also allows the renderer to avoid an unintended phantom image. For example, if the angular position of the mixing stage's left speaker does not correspond to the angular position of the playback system's left speaker, using the snap to closest speaker function could avoid having the playback system reproduce a constant phantom image of the mixing stage's left channel.
  • the adaptive audio system 100 allows the content creator to create individual audio objects and add information about the content that can be conveyed to the reproduction system. This allows a large amount of flexibility in the processing of audio prior to reproduction. From a content processing and rendering standpoint, the adaptive audio system enables processing to be adapted to the type of object. For example, dialog enhancement may be applied to dialog objects only. Dialog
  • enhancement refers to a method of processing audio that contains dialog such that the audibility and/or intelligibility of the dialog is increased and or improved.
  • the audio processing that is applied to dialog is inappropriate for non-dialog audio content (i.e. music, ambient effects, etc.) and can result in objectionable audible artifacts.
  • an audio object could contain only the dialog in a piece of content, and it can be labeled accordingly so that a rendering solution could selectively apply dialog enhancement to only the dialog content.
  • the dialog enhancement processing can process dialog exclusively (thereby limiting any processing being performed on any other content).
  • bass management filtering, attenuation, gain
  • Bass management refers to selectively isolating and processing only the bass (or lower) frequencies in a particular piece of content. With current audio systems and delivery mechanisms this is a "blind" process that is applied to all of the audio. With adaptive audio, specific audio objects for which bass management is appropriate can be identified by the metadata, and the rendering processing can be applied appropriately.
  • the adaptive audio system 100 also provides for object based dynamic range compression and selective upmixing.
  • Traditional audio tracks have the same duration as the content itself, while an audio object might occur for only a limited amount of time in the content.
  • the metadata associated with an object can contain information about its average and peak signal amplitude, as well as its onset or attack time (particularly for transient material). This information would allow a compressor to better adapt its compression and time constants (attack, release, etc.) to better suit the content.
  • content creators might choose to indicate in the adaptive audio bitstream whether an object should be upmixed or not. This information allows the Adaptive Audio Tenderer and upmixer to distinguish which audio elements can be safely upmixed, while respecting the creator' s intent.
  • Embodiments also allow the adaptive audio system to select a preferred rendering algorithm from a number of available rendering algorithms and/or surround sound formats.
  • available rendering algorithms include: binaural, stereo dipole, Ambisonics, Wave Field Synthesis (WFS), multi-channel panning, raw stems with position metadata.
  • Others include dual balance, and vector-based amplitude panning.
  • the binaural distribution format uses a two-channel representation of a sound field in terms of the signal present at the left and right ears. Binaural information can be created via in-ear recording or synthesized using HRTF models. Playback of a binaural representation is typically done over headphones, or by employing cross-talk cancellation. Playback over an arbitrary speaker set-up would require signal analysis to determine the associated sound field and /or signal source(s).
  • the stereo dipole rendering method is a transaural cross-talk cancellation process to make binaural signals playable over stereo speakers (e.g., at + and - 10 degrees off center).
  • Ambisonics is a (distribution format and a rendering method) that is encoded in a four channel form called B-format.
  • the first channel, W is the non-directional pressure signal
  • the second channel, X is the directional pressure gradient containing the front and back information
  • the third channel, Y contains the left and right, and the Z the up and down.
  • These channels define a first order sample of the complete soundfield at a point.
  • Ambisonics uses all available speakers to recreate the sampled (or synthesized) soundfield within the speaker array such that when some speakers are pushing, others are pulling.
  • Wave Field Synthesis is a rendering method of sound reproduction, based on the precise construction of the desired wave field by secondary sources.
  • WFS is based on Huygens' principle, and is implemented as speaker arrays (tens or hundreds) that ring the listening space and operate in a coordinated, phased fashion to re-create each individual sound wave.
  • Multi-channel panning is a distribution format and/or rendering method, and may be referred to as channel-based audio.
  • sound is represented as a number of discrete sources to be played back through an equal number of speakers at defined angles from the listener.
  • the content creator / mixer can create virtual images by panning signals between adjacent channels to provide direction cues; early reflections, reverb, etc., can be mixed into many channels to provide direction and environmental cues.
  • Raw stems with position metadata is a distribution format, and may also be referred to as object-based audio.
  • object-based audio In this format, distinct, "close mic'ed,” sound sources are represented along with position and environmental metadata. Virtual sources are rendered based on the metadata and playback equipment and listening environment.
  • the adaptive audio format is a hybrid of the multi-channel panning format and the raw stems format.
  • the rendering method in a present embodiment is multi-channel panning. For the audio channels, the rendering (panning) happens at authoring time, while for objects the rendering (panning) happens at playback.
  • Metadata is generated during the creation stage to encode certain positional information for the audio objects and to accompany an audio program to aid in rendering the audio program, and in particular, to describe the audio program in a way that enables rendering the audio program on a wide variety of playback equipment and playback environments.
  • the metadata is generated for a given program and the editors and mixers that create, collect, edit and manipulate the audio during post-production.
  • An important feature of the adaptive audio format is the ability to control how the audio will translate to playback systems and environments that differ from the mix environment. In particular, a given cinema may have lesser capabilities than the mix environment.
  • the adaptive audio Tenderer is designed to make the best use of the equipment available to re-create the mixer's intent. Further, the adaptive audio authoring tools allow the mixer to preview and adjust how the mix will be rendered on a variety of playback configurations. All of the metadata values can be conditioned on the playback environment and speaker configuration. For example, a different mix level for a given audio element can be specified based on the playback configuration or mode. In an embodiment, the list of conditioned playback modes is extensible and includes the following: (1) channel-based only playback: 5.1, 7.1, 7.1 (height), 9.1 ; and (2) discrete speaker playback: 3D, 2D (no height).
  • the metadata controls or dictates different aspects of the adaptive audio content and is organized based on different types including: program metadata, audio metadata, and rendering metadata (for channel and object).
  • Each type of metadata includes one or more metadata items that provide values for characteristics that are referenced by an identifier (ID).
  • ID an identifier
  • FIG. 5 is a table that lists the metadata types and associated metadata elements for the adaptive audio system, under an embodiment.
  • the first type of metadata is program metadata, which includes metadata elements that specify the frame rate, track count, extensible channel description, and mix stage description.
  • the frame rate metadata element specifies the rate of the audio content frames in units of frames per second (fps).
  • the raw audio format need not include framing of the audio or metadata since the audio is provided as full tracks (duration of a reel or entire feature) rather than audio segments (duration of an object).
  • the raw format does need to carry all the information required to enable the adaptive audio encoder to frame the audio and metadata, including the actual frame rate.
  • Table 1 shows the ID, example values and description of the frame rate metadata element.
  • the track count metadata element indicates the number of audio tracks in a frame.
  • An example adaptive audio decoder/processor can support up to 128 simultaneous audio tracks, while the adaptive audio format will support any number of audio tracks.
  • Table 2 shows the ID, example values and description of the track count metadata element.
  • Channel-based audio can be assigned to non-standard channels and the extensible channel description metadata element enables mixes to use new channel positions.
  • the following metadata shall be provided as shown in Table 3 :
  • the mix stage description metadata element specifies the frequency at which a particular speaker produces half the power of the passband.
  • Speaker number is an integer. speaker. Each speaker can be
  • the second type of metadata is audio metadata.
  • Each channel- based or object-based audio element consists of audio essence and metadata.
  • the audio essence is a monophonic audio stream carried on one of many audio tracks.
  • the associated metadata describes how the audio essence is stored (audio metadata, e.g., sample rate) or how it should be rendered (rendering metadata, e.g., desired audio source position).
  • audio tracks are continuous through the duration of the audio program.
  • the program editor or mixer is responsible for assigning audio elements to tracks.
  • the track use is expected to be sparse, i.e. median simultaneous track use may be only 16 to 32. In a typical implementation, the audio will be efficiently transmitted using a lossless encoder.
  • the format consists of up to 128 audio tracks where each track has a single sample rate and a single coding system. Each track lasts the duration of the feature (no explicit reel support).
  • the mapping of objects to tracks (time multiplexing) is the responsibility of the content creator (mixer).
  • the audio metadata includes the elements of sample rate, bit depth, and coding systems.
  • Table 5 shows the ID, example values and description of the sample rate metadata element. TABLE 5
  • SampleRate 16 24, 32, 44.1, 48, 88.2 96, and SampleRate field shall provide
  • Table 6 shows the ID, example values and description of the bit depth metad element (for PCM and lossless compression).
  • BitDepth Positive integer up to 32 Indication of sample bit depth.
  • Table 7 shows the ID, example values and description of the coding system metadata element.
  • audio track can be assigned any supported coding type
  • Channel Objects e.g. to indicate stems.
  • AudioTyp ⁇ dialog, music, effects, m&e, Audio type. List shall be
  • the third type of metadata is rendering metadata.
  • the rendering metadata specifies values that help the Tenderer to match as closely as possible the original mixer intent regardless of the playback environment.
  • the set of metadata elements are different for channel-based audio and object-based audio.
  • a first rendering metadata field selects between the two types of audio - channel-based or object-based, as shown in Table 8.
  • the rendering metadata for the channel-based audio comprises a position metadata element that specifies the audio source position as one or more speaker positions.
  • Table 9 shows the ID and values for the position metadata element for the channel-based case.
  • extension channel(s) Position and extent of extension channel(s) is provided by ExtChanPos, and
  • the rendering metadata for the channel-based audio also comprises a rendering control element that specifies certain characteristics with regard to playback of channel-based audio, as shown in Table 10.
  • the metadata includes analogous elements as for the channel-based audio.
  • Table 11 provides the ID and values for the object position metadata element.
  • Object position is described in one of three ways: three-dimensional co-ordinates; a plane and two-dimensional co-ordinates; or a line and a one-dimensional co-ordinate.
  • the rendering method can adapt based on the position information type.
  • Supported speaker zones include: L, C, R, Lss, Rss, Lrs, Rrs, Lts, Rts, Lc, Rc. Speaker zone list shall be extensible to support future zones.
  • Channel Configuration list shall be extensible and include 5.1 and Dolby Surround 7.1. Object may be attenuated or eliminated completely when rendering to smaller channel configurations.
  • rendering data could be modified directly (e.g. pan trajectory,
  • the metadata described above and illustrated in FIG. 5 is generated and stored as one or more files that are associated or indexed with corresponding audio content so that audio streams are processed by the adaptive audio system interpreting the metadata generated by the mixer.
  • the metadata described above is an example set of ID's, values, and definitions, and other or additional metadata elements may be included for use in the adaptive audio system.
  • two (or more) sets of metadata elements are associated with each of the channel and object based audio streams.
  • a first set of metadata is applied to the plurality of audio streams for a first condition of the playback environment
  • a second set of metadata is applied to the plurality of audio streams for a second condition of the playback environment.
  • the second or subsequent set of metadata elements replaces the first set of metadata elements for a given audio stream based on the condition of the playback environment.
  • the condition may include factors such as room size, shape, composition of material within the room, present occupancy and density of people in the room, ambient noise characteristics, ambient light characteristics, and any other factor that might affect the sound or even mood of the playback environment.
  • the rendering stage 110 of the adaptive audio processing system 100 may include audio post-production steps that lead to the creation of a final mix.
  • the three main categories of sound used in a movie mix are dialogue, music, and effects.
  • Effects consist of sounds that are not dialogue or music (e.g., ambient noise,
  • Sound effects can be recorded or synthesized by the sound designer or they can be sourced from effects libraries.
  • a sub-group of effects that involve specific noise sources e.g., footsteps, doors, etc. are known as Foley and are performed by
  • Foley actors The different types of sound are marked and panned accordingly by the recording engineers.
  • FIG. 6 illustrates an example workflow for a post-production process in an adaptive audio system, under an embodiment.
  • the re-recording mixer(s) 604 use the premixes (also known as the 'mix minus') along with the individual sound objects and positional data to create stems as a way of grouping, for example, dialogue, music, effects, Foley and background sounds.
  • the music and all effects stems can be used as a basis for creating dubbed language versions of the movie.
  • Each stem consists of a channel-based bed and several audio objects with metadata.
  • the rendering and mastering unit 608 renders the audio to the speaker locations in the dubbing theatre. This rendering allows the mixers to hear how the channel- based beds and audio objects combine, and also provides the ability to render to different configurations.
  • the mixer can use conditional metadata, which default to relevant profiles, to control how the content is rendered to surround channels. In this way, the mixers retain complete control of how the movie plays back in all the scalable environments.
  • a monitoring step may be included after either or both of the re -recording step 604 and the final mix step 606 to allow the mixer to hear and evaluate the intermediate content generated during each of these stages.
  • the stems, objects, and metadata are brought together in an adaptive audio package 614, which is produced by the printmaster 610.
  • This package also contains the backward-compatible (legacy 5.1 or 7.1) surround sound theatrical mix 612.
  • the rendering/mastering unit (RMU) 608 can render this output if desired; thereby eliminating the need for any additional workflow steps in generating existing channel-based deliverables.
  • the audio files are packaged using standard Material Exchange Format (MXF) wrapping.
  • MXF Material Exchange Format
  • the adaptive audio mix master file can also be used to generate other deliverables, such as consumer multi-channel or stereo mixes.
  • the intelligent profiles and conditional metadata allow controlled renderings that can significantly reduce the time required to create such mixes.
  • a packaging system can be used to create a digital cinema package for the deliverables including an adaptive audio mix.
  • the audio track files may be locked together to help prevent synchronization errors with the adaptive audio track files.
  • the speaker array in the playback environment may comprise any number of surround-sound speakers placed and designated in accordance with established surround sound standards. Any number of additional speakers for accurate rendering of the object-based audio content may also be placed based on the condition of the playback environment. These additional speakers may be set up by a sound engineer, and this set up is provided to the system in the form of a set-up file that is used by the system for rendering the object-based components of the adaptive audio to a specific speaker or speakers within the overall speaker array.
  • the set-up file includes at least a list of speaker designations and a mapping of channels to individual speakers, information regarding grouping of speakers, and a run-time mapping based on a relative position of speakers to the playback environment.
  • the run-time mapping is utilized by a snap-to feature of the system that renders point source object-based audio content to a specific speaker that is nearest to the perceived location of the sound as intended by the sound engineer.
  • FIG. 7 is a diagram of an example workflow for a digital cinema packaging process using adaptive audio files, under an embodiment.
  • the audio files comprising both the adaptive audio files and the 5.1 or 7.1 surround sound audio files are input to a wrapping/encryption block 704.
  • the PCM MXF file (with appropriate additional tracks appended) is encrypted using SMPTE specifications in accordance with existing practice.
  • the adaptive audio MXF is packaged as an auxiliary track file, and is optionally encrypted using a symmetric content key per the SMPTE specification.
  • This single DCP 708 can then be delivered to any Digital Cinema Initiatives (DCI) compliant server.
  • DCI Digital Cinema Initiatives
  • the wrapping/encryption component 704 may also provide input directly to a distribution KDM block 710 for generating an appropriate security key for use in the digital cinema server.
  • Other movie elements or files, such as subtitles 714 and images 716 may be wrapped and encrypted along with the audio files 702. In this case, certain processing steps may be included, such as compression 712 in the case of image files 716.
  • the adaptive audio system 100 allows the content creator to create individual audio objects and add information about the content that can be conveyed to the reproduction system. This allows a great deal of flexibility in the content management of audio. From a content management standpoint, adaptive audio methods enable several different features. These include changing the language of content by only replacing the dialog object for space saving, download efficiency, geographical playback adaptation, etc. Film, television and other entertainment programs are typically distributed internationally. This often requires that the language in the piece of content be changed depending on where it will be reproduced (French for films being shown in France, German for TV programs being shown in Germany, etc.). Today this often requires a completely independent audio soundtrack to be created, packaged and distributed.
  • the dialog for a piece of content could be an independent audio object.
  • This allows the language of the content to be easily changed without updating or altering other elements of the audio soundtrack such as music, effects, etc. This would not only apply to foreign languages but also inappropriate language for certain audiences (e.g., children's television shows, airline movies, etc.), targeted advertising, and so on.
  • the adaptive audio file format and associated processors allows for changes in how theatre equipment is installed, calibrated and maintained. With the introduction of many more potential speaker outputs, each individually equalized and balanced, there is a need for intelligent and time-efficient automatic room equalization, which may be performed through the ability to manually adjust any automated room equalization.
  • the adaptive audio system uses an optimized 1/12* octave band equalization engine. Up to 64 outputs can be processed to more accurately balance the sound in theatre.
  • the system also allows scheduled monitoring of the individual speaker outputs, from cinema processor output right through to the sound reproduced in the auditorium. Local or network alerts can be created to ensure that appropriate action is taken.
  • the flexible rendering system may automatically remove a damaged speaker or amplifier from the replay chain and render around it, so allowing the show to go on.
  • the cinema processor can be connected to the digital cinema server with existing 8xAES main audio connections, and an Ethernet connection for streaming adaptive audio data. Playback of surround 7.1 or 5.1 content uses the existing PCM connections.
  • the adaptive audio data is streamed over Ethernet to the cinema processor for decoding and rendering, and communication between the server and the cinema processor allows the audio to be identified and synchronized. In the event of any issue with the adaptive audio track playback, sound is reverted back to the Dolby Surround 7.1 or 5.1 PCM audio.
  • the adaptive audio system is designed to allow both content creators and exhibitors to decide how sound content is to be rendered in different playback speaker configurations.
  • the ideal number of speaker output channels used will vary accord to room size. Recommended speaker placement is thus dependent on many factors, such as size, composition, seating configuration, environment, average audience sizes, and so on.
  • Example or representative speaker configurations and layouts are provided herein for purposes of illustration only, and are not intended to limit the scope of any claimed embodiments.
  • the recommended layout of speakers for an adaptive audio system remains compatible with existing cinema systems, which is vital so as not to compromise the playback of existing 5.1 and 7.1 channel-based formats.
  • the positions of existing screen channels should not be altered too radically in an effort to heighten or accentuate the introduction of new speaker locations.
  • the adaptive audio format is capable of being accurately rendered in the cinema to speaker configurations such as 7.1, so even allowing the format (and associated benefits) to be used in existing theatres with no change to amplifiers or speakers.
  • the adaptive audio is intended to be truly adaptable and capable of accurate play back in a variety of auditoriums, whether they have a limited number of playback channels or many channels with highly flexible configurations.
  • FIG. 8 is an overhead view 800 of an example layout of suggested speaker locations for use with an adaptive audio system in a typical auditorium
  • FIG. 9 is a front view 900 of the example layout of suggested speaker locations at the screen of the auditorium.
  • the reference position referred to hereafter corresponds to a position 2/3 of the distance back from the screen to the rear wall, on the center line of the screen.
  • Standard screen speakers 801 are shown in their usual positions relative to the screen.
  • additional speakers 804 behind the screen such as Left Center (Lc) and Right Center (Rc) screen speakers (in the locations of Left Extra and Right Extra channels in 70 mm film formats), can be beneficial in creating smoother pans across the screen.
  • Lc Left Center
  • Rc Right Center
  • Such optional speakers particularly in auditoria with screens greater than 12 m (40 ft.) wide are thus recommended. All screen speakers should be angled such that they are aimed towards the reference position. The recommended placement of the subwoofer 810 behind the screen should remain unchanged, including maintaining asymmetric cabinet placement, with respect to the center of the room, to prevent stimulation of standing waves. Additional subwoofers 816 may be placed at the rear of the theatre.
  • Surround speakers 802 should be individually wired back to the amplifier rack, and be individually amplified where possible with a dedicated channel of power amplification matching the power handling of the speaker in accordance with the manufacturer' s specifications. Ideally, surround speakers should be specified to handle an increased SPL for each individual speaker, and also with wider frequency response where possible. As a rule of thumb for an average-sized theatre, the spacing of surround speakers should be between 2 and 3 m (6'6" and 9'9"), with left and right surround speakers placed symmetrically.
  • the spacing of surround speakers is most effectively considered as angles subtended from a given listener between adjacent speakers, as opposed to using absolute distances between speakers.
  • the angular distance between adjacent speakers should be 30 degrees or less, referenced from each of the four corners of the prime listening area. Good results can be achieved with spacing up to 50 degrees.
  • the speakers should maintain equal linear spacing adjacent to the seating area where possible. The linear spacing beyond the listening area, e.g. between the front row and the screen, can be slightly larger.
  • FIG. 11 is an example of a positioning of top surround speakers 808 and side surround speakers 806 relative to the reference position, under an embodiment.
  • Additional side surround speakers 806 should be mounted closer to the screen than the currently recommended practice of starting approximately one-third of the distance to the back of the auditorium. These speakers are not used as side surrounds during playback of Dolby Surround 7.1 or 5.1 soundtracks, but will enable smooth transition and improved timbre matching when panning objects from the screen speakers to the surround zones.
  • the surround arrays should be placed as low as practical, subject to the following constraints: the vertical placement of surround speakers at the front of the array should be reasonably close to the height of screen speaker acoustic center, and high enough to maintain good coverage across the seating area according to the directivity of the speaker.
  • the vertical placement of the surround speakers should be such that they form a straight line from front to back, and (typically) slanted upward so the relative elevation of surround speakers above the listeners is maintained toward the back of the cinema as the seating elevation increases, as shown in FIG. 10, which is a side view of an example layout of suggested speaker locations for use with an adaptive audio system in the typical auditorium. In practice, this can be achieved most simply by choosing the elevation for the front-most and rear-most side surround speakers, and placing the remaining speakers in a line between these points.
  • the side surround 806 and rear speakers 816 and top surrounds 808 should be aimed towards the reference position in the theatre, under defined guidelines regarding spacing, position, angle, and so on.
  • Embodiments of the adaptive audio cinema system and format achieve improved levels of audience immersion and engagement over present systems by offering powerful new authoring tools to mixers, and a new cinema processor featuring a flexible rendering engine that optimizes the audio quality and surround effects of the soundtrack to each room's speaker layout and characteristics.
  • the system maintains backwards compatibility and minimizes the impact on the current production and distribution workflows.
  • inventions have been described with respect to examples and implementations in a cinema environment in which the adaptive audio content is associated with film content for use in digital cinema processing systems, it should be noted that embodiments may also be implemented in non-cinema environments.
  • the adaptive audio content comprising object-based audio and channel-based audio may be used in conjunction with any related content (associated audio, video, graphic, etc.), or it may constitute standalone audio content.
  • the playback environment may be any appropriate listening environment from headphones or near field monitors to small or large rooms, cars, open air arenas, concert halls, and so on.
  • aspects of the system 100 may be implemented in an appropriate computer-based sound processing network environment for processing digital or digitized audio files.
  • Portions of the adaptive audio system may include one or more networks that comprise any desired number of individual machines, including one or more routers (not shown) that serve to buffer and route the data transmitted among the computers.
  • a network may be built on various different network protocols, and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof.
  • WAN Wide Area Network
  • LAN Local Area Network
  • one or more machines may be configured to access the Internet through web browser programs.
  • One or more of the components, blocks, processes or other functional components may be implemented through a computer program that controls execution of a processor- based computing device of the system. It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer- readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non- volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.

Abstract

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

Description

SYSTEM AND METHOD FOR ADAPTIVE AUDIO SIGNAL GENERATION, CODING AND RENDERING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application No. 61/504,005 filed 1 July 2011 and U.S. Provisional Application No. 61/636,429 filed 20 April 2012, both of which are hereby incorporated by reference in entirety for all purposes.
TECHNICAL FIELD
[0002] One or more implementations relate generally to audio signal processing, and more specifically to hybrid object and channel-based audio processing for use in cinema, home, and other environments.
BACKGROUND
[0003] The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
[0004] Ever since the introduction of sound with film, there has been a steady evolution of technology used to capture the creator's artistic intent for the motion picture sound track and to accurately reproduce it in a cinema environment. A fundamental role of cinema sound is to support the story being shown on screen. Typical cinema sound tracks comprise many different sound elements corresponding to elements and images on the screen, dialog, noises, and sound effects that emanate from different on-screen elements and combine with background music and ambient effects to create the overall audience experience. The artistic intent of the creators and producers represents their desire to have these sounds reproduced in a way that corresponds as closely as possible to what is shown on screen with respect to sound source position, intensity, movement and other similar parameters.
[0005] Current cinema authoring, distribution and playback suffer from limitations that constrain the creation of truly immersive and lifelike audio. Traditional channel-based audio systems send audio content in the form of speaker feeds to individual speakers in a playback environment, such as stereo and 5.1 systems. The introduction of digital cinema has created new standards for sound on film, such as the incorporation of up to 16 channels of audio to allow for greater creativity for content creators, and a more enveloping and realistic auditory experience for audiences. The introduction of 7.1 surround systems has provided a new format that increases the number of surround channels by splitting the existing left and right surround channels into four zones, thus increasing the scope for sound designers and mixers to control positioning of audio elements in the theatre.
[0006] To further improve the listener experience, playback of sound in virtual three- dimensional environments has become an area of increased research and development. The spatial presentation of sound utilizes audio objects, which are audio signals with associated parametric source descriptions of apparent source position (e.g., 3D coordinates), apparent source width, and other parameters. Object-based audio is increasingly being used for many current multimedia applications, such as digital movies, video games, simulators, and 3D video.
[0007] Expanding beyond traditional speaker feeds and channel-based audio as a means for distributing spatial audio is critical, and there has been considerable interest in a model- based audio description which holds the promise of allowing the listener/exhibitor the freedom to select a playback configuration that suits their individual needs or budget, with the audio rendered specifically for their chosen configuration. At a high level, there are four main spatial audio description formats at present: speaker feed in which the audio is described as signals intended for speakers at nominal speaker positions; microphone feed in which the audio is described as signals captured by virtual or actual microphones in a predefined array; model-based description in which the audio is described in terms of a sequence of audio events at described positions; and binaural in which the audio is described by the signals that arrive at the listeners ears. These four description formats are often associated with the one or more rendering technologies that convert the audio signals to speaker feeds. Current rendering technologies include panning, in which the audio stream is converted to speaker feeds using a set of panning laws and known or assumed speaker positions (typically rendered prior to distribution); Ambisonics, in which the microphone signals are converted to feeds for a scalable array of speakers (typically rendered after distribution); WFS (wave field synthesis) in which sound events are converted to the appropriate speaker signals to synthesize the sound field (typically rendered after distribution); and binaural, in which the L/R (left/right) binaural signals are delivered to the L/R ear, typically using headphones, but also by using speakers and crosstalk cancellation (rendered before or after distribution). Of these formats, the speaker-feed format is the most common because it is simple and effective. The best sonic results (most accurate, most reliable) are achieved by mixing/monitoring and distributing to the speaker feeds directly since there is no processing between the content creator and listener. If the playback system is known in advance, a speaker feed description generally provides the highest fidelity.
However, in many practical applications, the playback system is not known. The model- based description is considered the most adaptable because it makes no assumptions about the rendering technology and is therefore most easily applied to any rendering technology. Though the model-based description efficiently captures spatial information it becomes very inefficient as the number of audio sources increases.
[0008] For many years, cinema systems have featured discrete screen channels in the form of left, center, right and occasionally 'inner left' and 'inner right' channels. These discrete sources generally have sufficient frequency response and power handling to allow sounds to be accurately placed in different areas of the screen, and to permit timbre matching as sounds are moved or panned between locations. Recent developments in improving the listener experience attempt to accurately reproduce the location of the sounds relative to the listener. In a 5.1 setup, the surround 'zones' comprise of an array of speakers, all of which carry the same audio information within each left surround or right surround zone. Such arrays may be effective with 'ambient' or diffuse surround effects, however, in everyday life many sound effects originate from randomly placed point sources. For example, in a restaurant, ambient music may be played from apparently all around, while subtle but discrete sounds originate from specific points: a person chatting from one point, the clatter of a knife on a plate from another. Being able to place such sounds discretely around the auditorium can add a heightened sense of reality without being noticeably obvious. Overhead sounds are also an important component of surround definition. In the real world, sounds originate from all directions, and not always from a single horizontal plane. An added sense of realism can be achieved if sound can be heard from overhead, in other words from the 'upper hemisphere.' Present systems, however, do not offer truly accurate reproduction of sound for different audio types in a variety of different playback environments. A great deal of processing, knowledge, and configuration of actual playback environments is required using existing systems to attempt accurate representation of location specific sounds, thus rendering current systems impractical for most applications.
[0009] What is needed is a system that supports multiple screen channels, resulting in increased definition and improved audio- visual coherence for on-screen sounds or dialog, and the ability to precisely position sources anywhere in the surround zones to improve the audiovisual transition from screen to room. For example, if a character on screen looks inside the room towards a sound source, the sound engineer ("mixer") should have the ability to precisely position the sound so that it matches the character' s line of sight and the effect will be consistent throughout the audience. In a traditional 5.1 or 7.1 surround sound mix, however, the effect is highly dependent on the seating position of the listener, which is disadvantageous for most large-scale listening environments. Increased surround resolution creates new opportunities to use sound in a room-centric way as opposed to the traditional approach, where content is created assuming a single listener at the "sweet spot."
[0010] Aside from the spatial issues, current multi-channel state of the art systems suffer with regard to timbre. For example, the timbral quality of some sounds, such as steam hissing out of a broken pipe, can suffer from being reproduced by an array of speakers. The ability to direct specific sounds to a single speaker gives the mixer the opportunity to eliminate the artifacts of array reproduction and deliver a more realistic experience to the audience. Traditionally, surround speakers do not support the same full range of audio frequency and level that the large screen channels support. Historically, this has created issues for mixers, reducing their ability to freely move full-range sounds from screen to room. As a result, theatre owners have not felt compelled to upgrade their surround channel configuration, preventing the widespread adoption of higher quality installations.
BRIEF SUMMARY OF EMBODIMENTS
[0011] Systems and methods are described for a cinema sound format and processing system that includes a new speaker layout (channel configuration) and an associated spatial description format. An adaptive audio system and format is defined that supports multiple rendering technologies. Audio streams are transmitted along with metadata that describes the "mixer's intent" including desired position of the audio stream. The position can be expressed as a named channel (from within the predefined channel configuration) or as three- dimensional position information. This channels plus objects format combines optimum channel-based and model-based audio scene description methods. Audio data for the adaptive audio system comprises a number of independent monophonic audio streams. Each stream has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through mathematical expressions encoded in further associated metadata. The original independent audio streams are packaged as a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content. This enables sound to be optimally mixed for a particular playback environment that may be different from the mix environment experienced by the sound engineer.
[0012] The adaptive audio system improves the audio quality in different rooms through such benefits as improved room equalization and surround bass management, so that the speakers (whether on-screen or off-screen) can be freely addressed by the mixer without having to think about timbral matching. The adaptive audio system adds the flexibility and power of dynamic audio objects into traditional channel-based workflows. These audio objects allow creators to control discrete sound elements irrespective of any specific playback speaker configurations, including overhead speakers. The system also introduces new efficiencies to the postproduction process, allowing sound engineers to efficiently capture all of their intent and then in real-time monitor, or automatically generate, surround-sound 7.1 and 5.1 versions.
[0013] The adaptive audio system simplifies distribution by encapsulating the audio essence and artistic intent in a single track file within a digital cinema processor, which can be faithfully played back in a broad range of theatre configurations. The system provides optimal reproduction of artistic intent when mix and render use the same channel configuration and a single inventory with downward adaption to rendering configuration, i.e., downmixing.
[0014] These and other advantages are provided through embodiments that are directed to a cinema sound platform, address current system limitations and deliver an audio experience beyond presently available systems.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] In the following drawings like reference numbers are used to refer to like elements. Although the following figures depict various examples, the one or more implementations are not limited to the examples depicted in the figures.
[0016] FIG. 1 is a top-level overview of an audio creation and playback environment utilizing an adaptive audio system, under an embodiment.
[0017] FIG. 2 illustrates the combination of channel and object-based data to produce an adaptive audio mix, under an embodiment. [0018] FIG. 3 is a block diagram illustrating the workflow of creating, packaging and rendering adaptive audio content, under an embodiment.
[0019] FIG. 4 is a block diagram of a rendering stage of an adaptive audio system, under an embodiment.
[0020] FIG. 5 is a table that lists the metadata types and associated metadata elements for the adaptive audio system, under an embodiment.
[0021] FIG. 6 is a diagram that illustrates a post-production and mastering for an adaptive audio system, under an embodiment.
[0022] FIG. 7 is a diagram of an example workflow for a digital cinema packaging process using adaptive audio files, under an embodiment.
[0023] FIG. 8 is an overhead view of an example layout of suggested speaker locations for use with an adaptive audio system in a typical auditorium.
[0024] FIG. 9 is a front view of an example placement of suggested speaker locations at the screen for use in the typical auditorium.
[0025] FIG. 10 is a side view of an example layout of suggested speaker locations for use with in adaptive audio system in the typical auditorium.
[0026] FIG. 11 is an example of a positioning of top surround speakers and side surround speakers relative to the reference point, under an embodiment.
DETAILED DESCRIPTION
[0027] Systems and methods are described for an adaptive audio system and associated audio signal and data format that supports multiple rendering technologies. Aspects of the one or more embodiments described herein may be implemented in an audio or audio-visual system that processes source audio information in a mixing, rendering and playback system that includes one or more computers or processing devices executing software instructions. Any of the described embodiments may be used alone or together with one another in any combination. Although various embodiments may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different deficiencies that may be discussed in the specification. Some embodiments may only partially address some deficiencies or just one deficiency that may be discussed in the specification, and some embodiments may not address any of these deficiencies.
[0028] For purposes of the present description, the following terms have the associated meanings: [0029] Channel or audio channel: a monophonic audio signal or an audio stream plus metadata in which the position is coded as a channel ID, e.g. Left Front or Right Top Surround. A channel object may drive multiple speakers, e.g., the Left Surround channels (Ls) will feed all the speakers in the Ls array.
[0030] Channel Configuration: a pre-defined set of speaker zones with associated nominal locations, e.g. 5.1, 7.1, and so on; 5.1 refers to a six-channel surround sound audio system having front left and right channels, center channel, two surround channels, and a subwoofer channel; 7.1 refers to a eight-channel surround system that adds two additional surround channels to the 5.1 system. Examples of 5.1 and 7.1 configurations include Dolby® surround systems.
[0031] Speaker: an audio transducer or set of transducers that render an audio signal.
[0032] Speaker Zone: an array of one or more speakers can be uniquely referenced and that receive a single audio signal, e.g. Left Surround as typically found in cinema, and in particular for exclusion or inclusion for object rendering.
[0033] Speaker Channel or Speaker-feed Channel: an audio channel that is associated with a named speaker or speaker zone within a defined speaker configuration. A speaker channel is nominally rendered using the associated speaker zone.
[0034] Speaker Channel Group: a set of one or more speaker channels corresponding to a channel configuration (e.g. a stereo track, mono track, etc.)
[0035] Object or Object Channel: one or more audio channels with a parametric source description, such as apparent source position (e.g. 3D coordinates), apparent source width, etc. An audio stream plus metadata in which the position is coded as 3D position in space.
[0036] Audio Program: the complete set of speaker channels and/or object channels and associated metadata that describes the desired spatial audio presentation.
[0037] Allocentric reference: a spatial reference in which audio objects are defined relative to features within the rendering environment such as room walls and corners, standard speaker locations, and screen location (e.g., front left corner of a room).
[0038] Egocentric reference: a spatial reference in which audio objects are defined relative to the perspective of the (audience) listener and often specified with respect to angles relative to a listener (e.g., 30 degrees right of the listener).
[0039] Frame: frames are short, independently decodable segments into which a total audio program is divided. The audio frame rate and boundary is typically aligned with the video frames. [0040] Adaptive audio: channel-based and/or object-based audio signals plus metadata that renders the audio signals based on the playback environment.
[0041] The cinema sound format and processing system described herein, also referred to as an "adaptive audio system," utilizes a new spatial audio description and rendering technology to allow enhanced audience immersion, more artistic control, system flexibility and scalability, and ease of installation and maintenance. Embodiments of a cinema audio platform include several discrete components including mixing tools, packer/encoder, unpack/decoder, in-theater final mix and rendering components, new speaker designs, and networked amplifiers. The system includes recommendations for a new channel
configuration to be used by content creators and exhibitors. The system utilizes a model- based description that supports several features such as: single inventory with downward and upward adaption to rendering configuration, i.e., delay rendering and enabling optimal use of available speakers; improved sound envelopment, including optimized downmixing to avoid inter-channel correlation; increased spatial resolution through steer-thru arrays (e.g., an audio object dynamically assigned to one or more speakers within a surround array); and support for alternate rendering methods.
[0042] FIG. 1 is a top-level overview of an audio creation and playback environment utilizing an adaptive audio system, under an embodiment. As shown in FIG. 1 , a comprehensive, end-to-end environment 100 includes content creation, packaging, distribution and playback/rendering components across a wide number of end-point devices and use cases. The overall system 100 originates with content captured from and for a number of different use cases that comprise different user experiences 112. The content capture element 102 includes, for example, cinema, TV, live broadcast, user generated content, recorded content, games, music, and the like, and may include audio/visual or pure audio content. The content, as it progresses through the system 100 from the capture stage 102 to the final user experience 112, traverses several key processing steps through discrete system components. These process steps include pre-processing of the audio 104, authoring tools and processes 106, encoding by an audio codec 108 that captures, for example, audio data, additional metadata and reproduction information, and object channels. Various processing effects, such as compression (lossy or lossless), encryption, and the like may be applied to the object channels for efficient and secure distribution through various mediums. Appropriate endpoint- specific decoding and rendering processes 110 are then applied to reproduce and convey a particular adaptive audio user experience 112. The audio experience 112 represents the playback of the audio or audio/visual content through appropriate speakers and playback devices, and may represent any environment in which a listener is experiencing playback of the captured content, such as a cinema, concert hall, outdoor theater, a home or room, listening booth, car, game console, headphone or headset system, public address (PA) system, or any other playback environment.
[0043] The embodiment of system 100 includes an audio codec 108 that is capable of efficient distribution and storage of multichannel audio programs, and hence may be referred to as a 'hybrid' codec. The codec 108 combines traditional channel-based audio data with associated metadata to produce audio objects that facilitate the creation and delivery of audio that is adapted and optimized for rendering and playback in environments that maybe different from the mixing environment. This allows the sound engineer to encode his or her intent with respect to how the final audio should be heard by the listener, based on the actual listening environment of the listener.
[0044] Conventional channel-based audio codecs operate under the assumption that the audio program will be reproduced by an array of speakers in predetermined positions relative to the listener. To create a complete multichannel audio program, sound engineers typically mix a large number of separate audio streams (e.g. dialog, music, effects) to create the overall desired impression. Audio mixing decisions are typically made by listening to the audio program as reproduced by an array of speakers in the predetermined positions, e.g., a particular 5.1 or 7.1 system in a specific theatre. The final, mixed signal serves as input to the audio codec. For reproduction, the spatially accurate sound fields are achieved only when the speakers are placed in the predetermined positions.
[0045] A new form of audio coding called audio object coding provides distinct sound sources (audio objects) as input to the encoder in the form of separate audio streams.
Examples of audio objects include dialog tracks, single instruments, individual sound effects, and other point sources. Each audio object is associated with spatial parameters, which may include, but are not limited to, sound position, sound width, and velocity information. The audio objects and associated parameters are then coded for distribution and storage. Final audio object mixing and rendering is performed at the receive end of the audio distribution chain, as part of audio program playback. This step may be based on knowledge of the actual speaker positions so that the result is an audio distribution system that is customizable to user-specific listening conditions. The two coding forms, channel-based and object-based, perform optimally for different input signal conditions. Channel-based audio coders are generally more efficient for coding input signals containing dense mixtures of different audio sources and for diffuse sounds. Conversely, audio object coders are more efficient for coding a small number of highly directional sound sources.
[0046] In an embodiment, the methods and components of system 100 comprise an audio encoding, distribution, and decoding system configured to generate one or more bitstreams containing both conventional channel-based audio elements and audio object coding elements. Such a combined approach provides greater coding efficiency and rendering flexibility compared to either channel-based or object-based approaches taken separately.
[0047] Other aspects of the described embodiments include extending a predefined channel-based audio codec in a backwards-compatible manner to include audio object coding elements. A new 'extension layer' containing the audio object coding elements is defined and added to the 'base' or 'backwards compatible' layer of the channel-based audio codec bitstream. This approach enables one or more bitstreams, which include the extension layer to be processed by legacy decoders, while providing an enhanced listener experience for users with new decoders. One example of an enhanced user experience includes control of audio object rendering. An additional advantage of this approach is that audio objects may be added or modified anywhere along the distribution chain without decoding/mixing/re- encoding multichannel audio encoded with the channel-based audio codec.
[0048] With regard to the frame of reference, the spatial effects of audio signals are critical in providing an immersive experience for the listener. Sounds that are meant to emanate from a specific region of a viewing screen or room should be played through speaker(s) located at that same relative location. Thus, the primary audio metadatum of a sound event in a model-based description is position, though other parameters such as size, orientation, velocity and acoustic dispersion can also be described. To convey position, a model-based, 3D, audio spatial description requires a 3D coordinate system. The coordinate system used for transmission (Euclidean, spherical, etc) is generally chosen for convenience or compactness, however, other coordinate systems may be used for the rendering processing. In addition to a coordinate system, a frame of reference is required for representing the locations of objects in space. For systems to accurately reproduce position- based sound in a variety of different environments, selecting the proper frame of reference can be a critical factor. With an allocentric reference frame, an audio source position is defined relative to features within the rendering environment such as room walls and corners, standard speaker locations, and screen location. In an egocentric reference frame, locations are represented with respect to the perspective of the listener, such as "in front of me, slightly to the left," and so on. Scientific studies of spatial perception (audio and otherwise), have shown that the egocentric perspective is used almost universally. For cinema however, allocentric is generally more appropriate for several reasons. For example, the precise location of an audio object is most important when there is an associated object on screen. Using an allocentric reference, for every listening position, and for any screen size, the sound will localize at the same relative position on the screen, e.g., one-third left of the middle of the screen. Another reason is that mixers tend to think and mix in allocentric terms, and panning tools are laid out with an allocentric frame (the room walls), and mixers expect them to be rendered that way, e.g., this sound should be on screen, this sound should be off screen, or from the left wall, etc.
[0049] Despite the use of the allocentric frame of reference in the cinema environment, there are some cases where an egocentric frame of reference may be useful, and more appropriate. These include non-diegetic sounds, i.e., those that are not present in the "story space," e.g. mood music, for which an egocentrically uniform presentation may be desirable. Another case is near- field effects (e.g., a buzzing mosquito in the listener's left ear) that require an egocentric representation. Currently there are no means for rendering such a sound field short of using headphones or very near field speakers. In addition, infinitely far sound sources (and the resulting plane waves) appear to come from a constant egocentric position (e.g., 30 degrees to the left), and such sounds are easier to describe in egocentric terms than in allocentric terms.
[0050] In the some cases, it is possible to use an allocentric frame of reference as long as a nominal listening position is defined, while some examples require an egocentric representation that are not yet possible to render. Although an allocentric reference may be more useful and appropriate, the audio representation should be extensible, since many new features, including egocentric representation may be more desirable in certain applications and listening environments. Embodiments of the adaptive audio system include a hybrid spatial description approach that includes a recommended channel configuration for optimal fidelity and for rendering of diffuse or complex, multi-point sources (e.g., stadium crowd, ambiance) using an egocentric reference, plus an allocentric, model-based sound description to efficiently enable increased spatial resolution and scalability.
System Components
[0051] With reference to FIG. 1, the original sound content data 102 is first processed in a pre-processing block 104. The pre-processing block 104 of system 100 includes an object channel filtering component. In many cases, audio objects contain individual sound sources to enable independent panning of sounds. In some cases, such as when creating audio programs using natural or "production" sound, it may be necessary to extract individual sound objects from a recording that contains multiple sound sources. Embodiments include a method for isolating independent source signals from a more complex signal. Undesirable elements to be separated from independent source signals may include, but are not limited to, other independent sound sources and background noise. In addition, reverb may be removed to recover "dry" sound sources.
[0052] The pre-processor 104 also includes source separation and content type detection functionality. The system provides for automated generation of metadata through analysis of input audio. Positional metadata is derived from a multi-channel recording through an analysis of the relative levels of correlated input between channel pairs. Detection of content type, such as "speech" or "music", may be achieved, for example, by feature extraction and classification.
Authoring Tools
[0053] The authoring tools block 106 includes features to improve the authoring of audio programs by optimizing the input and codification of the sound engineer's creative intent allowing him to create the final audio mix once that is optimized for playback in practically any playback environment. This is accomplished through the use of audio objects and positional data that is associated and encoded with the original audio content. In order to accurately place sounds around an auditorium the sound engineer needs control over how the sound will ultimately be rendered based on the actual constraints and features of the playback environment. The adaptive audio system provides this control by allowing the sound engineer to change how the audio content is designed and mixed through the use of audio objects and positional data.
[0054] Audio objects can be considered as groups of sound elements that may be perceived to emanate from a particular physical location or locations in the auditorium. Such objects can be static, or they can move. In the adaptive audio system 100, the audio objects are controlled by metadata, which among other things, details the position of the sound at a given point in time. When objects are monitored or played back in a theatre, they are rendered according to the positional metadata using the speakers that are present, rather than necessarily being output to a physical channel. A track in a session can be an audio object, and standard panning data is analogous to positional metadata. In this way, content placed on the screen might pan in effectively the same way as with channel-based content, but content placed in the surrounds can be rendered to an individual speaker if desired. While the use of audio objects provides desired control for discrete effects, other aspects of a movie soundtrack do work effectively in a channel-based environment. For example, many ambient effects or reverberation actually benefit from being fed to arrays of speakers. Although these could be treated as objects with sufficient width to fill an array, it is beneficial to retain some channel-based functionality.
[0055] In an embodiment, the adaptive audio system supports 'beds' in addition to audio objects, where beds are effectively channel-based sub-mixes or stems. These can be delivered for final playback (rendering) either individually, or combined into a single bed, depending on the intent of the content creator. These beds can be created in different channel-based configurations such as 5.1, 7.1, and are extensible to more extensive formats such as 9.1 , and arrays that include overhead speakers.
[0056] FIG. 2 illustrates the combination of channel and object-based data to produce an adaptive audio mix, under an embodiment. As shown in process 200, the channel-based data 202, which, for example, may be 5.1 or 7.1 surround sound data provided in the form of pulse-code modulated (PCM) data is combined with audio object data 204 to produce an adaptive audio mix 208. The audio object data 204 is produced by combining the elements of the original channel-based data with associated metadata that specifies certain parameters pertaining to the location of the audio objects.
[0057] As shown conceptually in FIG. 2, the authoring tools provide the ability to create audio programs that contain a combination of speaker channel groups and object channels simultaneously. For example, an audio program could contain one or more speaker channels optionally organized into groups (or tracks, e.g. a stereo or 5.1 track), descriptive metadata for one or more speaker channels, one or more object channels, and descriptive metadata for one or more object channels. Within one audio program, each speaker channel group, and each object channel may be represented using one or more different sample rates. For example, Digital Cinema (D-Cinema) applications support 48 kHz and 96 kHz sample rates, but other sample rates may also be supported. Furthermore, ingest, storage and editing of channels with different sample rates may also be supported.
[0058] The creation of an audio program requires the step of sound design, which includes combining sound elements as a sum of level adjusted constituent sound elements to create a new, desired sound effect. The authoring tools of the adaptive audio system enable the creation of sound effects as a collection of sound objects with relative positions using a spatio-visual sound design graphical user interface. For example, a visual representation of the sound generating object (e.g., a car) can be used as a template for assembling audio elements (exhaust note, tire hum, engine noise) as object channels containing the sound and the appropriate spatial position (at the tail pipe, the tires, the hood). The individual object channels can then be linked and manipulated as a group. The authoring tool 106 includes several user interface elements to allow the sound engineer to input control information and view mix parameters, and improve the system functionality. The sound design and authoring process is also improved by allowing object channels and speaker channels to be linked and manipulated as a group. One example is combining an object channel with a discrete, dry sound source with a set of speaker channels that contain an associated reverb signal.
[0059] The audio authoring tool 106 supports the ability to combine multiple audio channels, commonly referred to as mixing. Multiple methods of mixing are supported, and may include traditional level-based mixing and loudness based mixing. In level-based mixing, wideband scaling is applied to the audio channels, and the scaled audio channels are then summed together. The wideband scale factors for each channel are chosen to control the absolute level of the resulting mixed signal, and also the relative levels of the mixed channels within the mixed signal. In loudness-based mixing, one or more input signals are modified using frequency dependent amplitude scaling, where the frequency dependent amplitude is chosen to provide the desired perceived absolute and relative loudness, while preserving the perceived timbre of the input sound.
[0060] The authoring tools allow for the ability to create speaker channels and speaker channel groups. This allows metadata to be associated with each speaker channel group. Each speaker channel group can be tagged according to content type. The content type is extensible via a text description. Content types may include, but are not limited to, dialog, music, and effects. Each speaker channel group may be assigned unique instructions on how to upmix from one channel configuration to another, where upmixing is defined as the creation of M audio channels from N channels where M > N. Upmix instructions may include, but are not limited to, the following: an enable/disable flag to indicate if upmixing is permitted; an upmix matrix to control the mapping between each input and output channel; and default enable and matrix settings may be assigned based on content type, e.g., enable upmixing for music only. Each speaker channel group may be also be assigned unique instructions on how to downmix from one channel configuration to another, where downmixing is defined as the creation of Y audio channels from X channels where Y < X. Downmix instructions may include, but are not limited to, the following: a matrix to control the mapping between each input and output channel; and default matrix settings can be assigned based on content type, e.g., dialog shall downmix onto screen; effects shall downmix off the screen. Each speaker channel can also be associated with a metadata flag to disable bass management during rendering.
[0061] Embodiments include a feature that enables the creation of object channels and object channel groups. This invention allows metadata to be associated with each object channel group. Each object channel group can be tagged according to content type. The content type is extensible via a text description, wherein the content types may include, but are not limited to, dialog, music, and effects. Each object channel group can be assigned metadata to describe how the object(s) should be rendered.
[0062] Position information is provided to indicate the desired apparent source position. Position may be indicated using an egocentric or allocentric frame of reference. The egocentric reference is appropriate when the source position is to be referenced to the listener. For egocentric position, spherical coordinates are useful for position description. An allocentric reference is the typical frame of reference for cinema or other audio/visual presentations where the source position is referenced relative to objects in the presentation environment such as a visual display screen or room boundaries. Three-dimensional (3D) trajectory information is provided to enable the interpolation of position or for use of other rendering decisions such as enabling a "snap to mode." Size information is provided to indicate the desired apparent perceived audio source size.
[0063] Spatial quantization is provided through a "snap to closest speaker" control that indicates an intent by the sound engineer or mixer to have an object rendered by exactly one speaker (with some potential sacrifice to spatial accuracy). A limit to the allowed spatial distortion can be indicated through elevation and azimuth tolerance thresholds such that if the threshold is exceeded, the "snap" function will not occur. In addition to distance thresholds, a crossfade rate parameter can be indicated to control how quickly a moving object will transition or jump from one speaker to another when the desired position crosses between to speakers.
[0064] In an embodiment, dependent spatial metadata is used for certain position metadata. For example, metadata can be automatically generated for a "slave" object by associating it with a "master" object that the slave object is to follow. A time lag or relative speed can be assigned to the slave object. Mechanisms may also be provided to allow for the definition of an acoustic center of gravity for sets or groups of objects, so that an object may be rendered such that it is perceived to move around another object. In such a case, one or more objects may rotate around an object or a defined area, such as a dominant point, or a dry area of the room. The acoustic center of gravity would then be used in the rendering stage to help determine location information for each appropriate object-based sound, even though the ultimate location information would be expressed as a location relative to the room, as opposed to a location relative to another object.
[0065] When an object is rendered it is assigned to one or more speakers according to the position metadata, and the location of the playback speakers. Additional metadata may be associated with the object to limit the speakers that shall be used. The use of restrictions can prohibit the use of indicated speakers or merely inhibit the indicated speakers (allow less energy into the speaker or speakers than would otherwise be applied). The speaker sets to be restricted may include, but are not limited to, any of the named speakers or speaker zones (e.g. L, C, R, etc.), or speaker areas, such as: front wall, back wall, left wall, right wall, ceiling, floor, speakers within the room, and so on. Likewise, in the course of specifying the desired mix of multiple sound elements, it is possible to cause one or more sound elements to become inaudible or "masked" due to the presence of other "masking" sound elements. For example, when masked elements are detected, they could be identified to the user via a graphical display.
[0066] As described elsewhere, the audio program description can be adapted for rendering on a wide variety of speaker installations and channel configurations. When an audio program is authored, it is important to monitor the effect of rendering the program on anticipated playback configurations to verify that the desired results are achieved. This invention includes the ability to select target playback configurations and monitor the result. In addition, the system can automatically monitor the worst case (i.e. highest) signal levels that would be generated in each anticipated playback configuration, and provide an indication if clipping or limiting will occur.
[0067] FIG. 3 is a block diagram illustrating the workflow of creating, packaging and rendering adaptive audio content, under an embodiment. The workflow 300 of FIG. 3 is divided into three distinct task groups labeled creation/authoring, packaging, and exhibition. In general, the hybrid model of beds and objects shown in FIG. 2 allows most sound design, editing, pre-mixing, and final mixing to be performed in the same manner as they are today and without adding excessive overhead to present processes. In an embodiment, the adaptive audio functionality is provided in the form of software, firmware or circuitry that is used in conjunction with sound production and processing equipment, wherein such equipment may be new hardware systems or updates to existing systems. For example, plug-in applications may be provided for digital audio workstations to allow existing panning techniques within sound design and editing to remain unchanged. In this way, it is possible to lay down both beds and objects within the workstation in 5.1 or similar surround-equipped editing rooms. Object audio and metadata is recorded in the session in preparation for the pre- and final-mix stages in the dubbing theatre.
[0068] As shown in FIG. 3, the creation or authoring tasks involve inputting mixing controls 302 by a user, e.g., a sound engineer in the following example, to a mixing console or audio workstation 304. In an embodiment, metadata is integrated into the mixing console surface, allowing the channel strips' faders, panning and audio processing to work with both beds or stems and audio objects. The metadata can be edited using either the console surface or the workstation user interface, and the sound is monitored using a rendering and mastering unit (RMU) 306. The bed and object audio data and associated metadata is recorded during the mastering session to create a 'print master,' which includes an adaptive audio mix 310 and any other rendered deliverables (such as a surround 7.1 or 5.1 theatrical mix) 308.
Existing authoring tools (e.g. digital audio workstations such as Pro Tools) may be used to allow sound engineers to label individual audio tracks within a mix session. Embodiments extend this concept by allowing users to label individual sub-segments within a track to aid in finding or quickly identifying audio elements. The user interface to the mixing console that enables definition and creation of the metadata may be implemented through graphical user interface elements, physical controls (e.g., sliders and knobs), or any combination thereof.
[0069] In the packaging stage, the print master file is wrapped using industry-standard MXF wrapping procedures, hashed and optionally encrypted in order to ensure integrity of the audio content for delivery to the digital cinema packaging facility. This step may be performed by a digital cinema processor (DCP) 312 or any appropriate audio processor depending on the ultimate playback environment, such as a standard surround-sound equipped theatre 318, an adaptive audio-enabled theatre 320, or any other playback environment. As shown in FIG. 3, the processor 312 outputs the appropriate audio signals 314 and 316 depending on the exhibition environment.
[0070] In an embodiment, the adaptive audio print master contains an adaptive audio mix, along with a standard DCI-compliant Pulse Code Modulated (PCM) mix. The PCM mix can be rendered by the rendering and mastering unit in a dubbing theatre, or created by a separate mix pass if desired. PCM audio forms the standard main audio track file within the digital cinema processor 312, and the adaptive audio forms an additional track file. Such a track file may be compliant with existing industry standards, and is ignored by DCI-compliant servers that cannot use it. [0071] In an example cinema playback environment, the DCP containing an adaptive audio track file is recognized by a server as a valid package, and ingested into the server and then streamed to an adaptive audio cinema processor. A system that has both linear PCM and adaptive audio files available, the system can switch between them as necessary. For distribution to the exhibition stage, the adaptive audio packaging scheme allows the delivery of a single type of package to be delivered to a cinema. The DCP package contains both PCM and adaptive audio files. The use of security keys, such as a key delivery message (KDM) may be incorporated to enable secure delivery of movie content, or other similar content.
[0072] As shown in FIG. 3, the adaptive audio methodology is realized by enabling a sound engineer to express his or her intent with regard to the rendering and playback of audio content through the audio workstation 304. By controlling certain input controls, the engineer is able to specify where and how audio objects and sound elements are played back depending on the listening environment. Metadata is generated in the audio workstation 304 in response to the engineer's mixing inputs 302 to provide rendering queues that control spatial parameters (e.g., position, velocity, intensity, timbre, etc.) and specify which speaker(s) or speaker groups in the listening environment play respective sounds during exhibition. The metadata is associated with the respective audio data in the workstation 304 or RMU 306 for packaging and transport by DCP 312.
[0073] A graphical user interface and software tools that provide control of the workstation 304 by the engineer comprise at least part of the authoring tools 106 of FIG. 1. Hybrid Audio Codec
[0074] As shown in FIG. 1, system 100 includes a hybrid audio codec 108. This component comprises an audio encoding, distribution, and decoding system that is configured to generate a single bitstream containing both conventional channel-based audio elements and audio object coding elements. The hybrid audio coding system is built around a channel- based encoding system that is configured to generate a single (unified) bitstream that is simultaneously compatible with (i.e., decodable by) a first decoder configured to decode audio data encoded in accordance with a first encoding protocol (channel-based) and one or more secondary decoders configured to decode audio data encoded in accordance with one or more secondary encoding protocols (object-based). The bitstream can include both encoded data (in the form of data bursts) decodable by the first decoder (and ignored by any secondary decoders) and encoded data (e.g., other bursts of data) decodable by one or more secondary decoders (and ignored by the first decoder). The decoded audio and associated information (metadata) from the first and one or more of the secondary decoders can then be combined in a manner such that both the channel-based and object-based information is rendered simultaneously to recreate a facsimile of the environment, channels, spatial information, and objects presented to the hybrid coding system (i.e. within a 3D space or listening
environment).
[0075] The codec 108 generates a bitstream containing coded audio information and information relating to multiple sets of channel positions (speakers). In one embodiment, one set of channel positions is fixed and used for the channel based encoding protocol, while another set of channel positions is adaptive and used for the audio object based encoding protocol, such that the channel configuration for an audio object may change as a function of time (depending on where the object is placed in the sound field). Thus, the hybrid audio coding system may carry information about two sets of speaker locations for playback, where one set may be fixed and be a subset of the other. Devices supporting legacy coded audio information would decode and render the audio information from the fixed subset, while a device capable of supporting the larger set could decode and render the additional coded audio information that would be time-varyingly assigned to different speakers from the larger set. Moreover, the system is not dependent on the first and one or more of the secondary decoders being simultaneously present within a system and/or device. Hence, a legacy and/or existing device/system containing only a decoder supporting the first protocol would yield a fully compatible sound field to be rendered via traditional channel-based reproduction systems. In this case, the unknown or unsupported portion(s) of the hybrid-bitstream protocol (i.e., the audio information represented by a secondary encoding protocol) would be ignored by the system or device decoder supporting the first hybrid encoding protocol.
[0076] In another embodiment, the codec 108 is configured to operate in a mode where the first encoding subsystem (supporting the first protocol) contains a combined
representation of all the sound field information (channels and objects) represented in both the first and one or more of the secondary encoder subsystems present within the hybrid encoder. This ensures that the hybrid bitstream includes backward compatibility with decoders supporting only the first encoder subsystem's protocol by allowing audio objects (typically carried in one or more secondary encoder protocols) to be represented and rendered within decoders supporting only the first protocol.
[0077] In yet another embodiment, the codec 108 includes two or more encoding subsystems, where each of these subsystems is configured to encode audio data in accordance with a different protocol, and is configured to combine the outputs of the subsystems to generate a hybrid-format (unified) bitstream.
[0078] One of the benefits the embodiments is the ability for a hybrid coded audio bitstream to be carried over a wide -range of content distribution systems, where each of the distribution systems conventionally supports only data encoded in accordance with the first encoding protocol. This eliminates the need for any system and/or transport level protocol modifications/changes in order to specifically support the hybrid coding system.
[0079] Audio encoding systems typically utilize standardized bitstream elements to enable the transport of additional (arbitrary) data within the bitstream itself. This additional (arbitrary) data is typically skipped (i.e., ignored) during decoding of the encoded audio included in the bitstream, but may be used for a purpose other than decoding. Different audio coding standards express these additional data fields using unique nomenclature. Bitstream elements of this general type may include, but are not limited to, auxiliary data, skip fields, data stream elements, fill elements, ancillary data, and substream elements. Unless otherwise noted, usage of the expression "auxiliary data" in this document does not imply a specific type or format of additional data, but rather should be interpreted as a generic expression that encompasses any or all of the examples associated with the present invention.
[0080] A data channel enabled via "auxiliary" bitstream elements of a first encoding protocol within a combined hybrid coding system bitstream could carry one or more secondary (independent or dependent) audio bitstreams (encoded in accordance with one or more secondary encoding protocols). The one or more secondary audio bitstreams could be split into N-sample blocks and multiplexed into the "auxiliary data" fields of a first bitstream. The first bitstream is decodable by an appropriate (complement) decoder. In addition, the auxiliary data of the first bitstream could be extracted, recombined into one or more secondary audio bitstreams, decoded by a processor supporting the syntax of one or more of the secondary bitstreams, and then combined and rendered together or independently.
Moreover, it is also possible to reverse the roles of the first and second bitstreams, so that blocks of data of a first bitstream are multiplexed into the auxiliary data of a second bitstream.
[0081] Bitstream elements associated with a secondary encoding protocol also carry and convey information (metadata) characteristics of the underlying audio, which may include, but are not limited to, desired sound source position, velocity, and size. This metadata is utilized during the decoding and rendering processes to re-create the proper (i.e., original) position for the associated audio object carried within the applicable bitstream. It is also possible to carry the metadata described above, which is applicable to the audio objects contained in the one or more secondary bitstreams present in the hybrid stream, within bitstream elements associated with the first encoding protocol.
[0082] Bitstream elements associated with either or both the first and second encoding protocols of the hybrid coding system carry/convey contextual metadata that identify spatial parameters (i.e., the essence of the signal properties itself) and further information describing the underlying audio essence type in the form of specific audio classes that are carried within the hybrid coded audio bitstream. Such metadata could indicate, for example, the presence of spoken dialogue, music, dialogue over music, applause, singing voice, etc., and could be utilized to adaptively modify the behavior of interconnected pre or post processing modules upstream or downstream of the hybrid coding system.
[0083] In an embodiment, the codec 108 is configured to operate with a shared or common bit pool in which bits available for coding are "shared" between all or part of the encoding subsystems supporting one or more protocols. Such a codec may distribute the available bits (from the common "shared" bit pool) between the encoding subsystems in order to optimize the overall audio quality of the unified bitstream. For example, during a first time interval, the codec may assign more of the available bits to a first encoding subsystem, and fewer of the available bits to the remaining subsystems, while during a second time interval, the codec may assign fewer of the available bits to the first encoding subsystem, and more of the available bits to the remaining subsystems. The decision of how to assign bits between encoding subsystems may be dependent, for example, on results of statistical analysis of the shared bit pool, and/or analysis of the audio content encoded by each subsystem. The codec may allocate bits from the shared pool in such a way that a unified bitstream constructed by multiplexing the outputs of the encoding subsystems maintains a constant frame length/bitrate over a specific time interval. It is also possible, in some cases, for the frame length/bitrate of the unified bitstream to vary over a specific time interval.
[0084] In an alternative embodiment, the codec 108 generates a unified bitstream including data encoded in accordance with the first encoding protocol configured and transmitted as an independent substream of an encoded data stream (which a decoder supporting the first encoding protocol will decode), and data encoded in accordance with a second protocol sent as an independent or dependent substream of the encoded data stream (one which a decoder supporting the first protocol will ignore). More generally, in a class of embodiments the codec generates a unified bitstream including two or more independent or dependent substreams (where each substream includes data encoded in accordance with a different or identical encoding protocol).
[0085] In yet another alternative embodiment, the codec 108 generates a unified bitstream including data encoded in accordance with the first encoding protocol configured and transmitted with a unique bitstream identifier (which a decoder supporting a first encoding protocol associated with the unique bitstream identifier will decode), and data encoded in accordance with a second protocol configured and transmitted with a unique bitstream identifier, which a decoder supporting the first protocol will ignore. More generally, in a class of embodiments the codec generates a unified bitstream including two or more substreams (where each substream includes data encoded in accordance with a different or identical encoding protocol and where each carries a unique bitstream identifier). The methods and systems for creating a unified bitstream described above provide the ability to unambiguously signal (to a decoder) which interleaving and/or protocol has been utilized within a hybrid bitstream (e.g., to signal whether the AUX data, SKIP, DSE or the substream approach described in the is utilized).
[0086] The hybrid coding system is configured to support de-interleaving/demultiplexing and re-interleaving/re-multiplexing of bitstreams supporting one or more secondary protocols into a first bitstream (supporting a first protocol) at any processing point found throughout a media delivery system. The hybrid codec is also configured to be capable of encoding audio input streams with different sample rates into one bitstream. This provides a means for efficiently coding and distributing audio sources containing signals with inherently different bandwidths. For example, dialog tracks typically have inherently lower bandwidth than music and effects tracks.
Rendering
[0087] Under an embodiment, the adaptive audio system allows multiple (e.g., up to 128) tracks to be packaged, usually as a combination of beds and objects. The basic format of the audio data for the adaptive audio system comprises a number of independent monophonic audio streams. Each stream has associated with it metadata that specifies whether the stream is a channel-based stream or an object-based stream. The channel -based streams have rendering information encoded by means of channel name or label; and the object-based streams have location information encoded through mathematical expressions encoded in further associated metadata. The original independent audio streams are then packaged as a single serial bitstream that contains all of the audio data in an ordered fashion. This adaptive data configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the ultimate rendering location of a sound is based on the playback environment to correspond to the mixer's intent. Thus, a sound can be specified to originate from a frame of reference of the playback room (e.g., middle of left wall), rather than a specific labeled speaker or speaker group (e.g., left surround). The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.
[0088] The Tenderer takes the bitstream encoding the audio tracks, and processes the content according to the signal type. Beds are fed to arrays, which will potentially require different delays and equalization processing than individual objects. The process supports rendering of these beds and objects to multiple (up to 64) speaker outputs. FIG. 4 is a block diagram of a rendering stage of an adaptive audio system, under an embodiment. As shown in system 400 of FIG. 4, a number of input signals, such as up to 128 audio tracks that comprise the adaptive audio signals 402 are provided by certain components of the creation, authoring and packaging stages of system 300, such as RMU 306 and processor 312. These signals comprise the channel-based beds and objects that are utilized by the Tenderer 404. The channel-based audio (beds) and objects are input to a level manager 406 that provides control over the output levels or amplitudes of the different audio components. Certain audio components may be processed by an array correction component 408. The adaptive audio signals are then passed through a B -chain processing component 410, which generates a number (e.g., up to 64) of speaker feed output signals. In general, the B-chain feeds refer to the signals processed by power amplifiers, crossovers and speakers, as opposed to A-chain content that constitutes the sound track on the film stock.
[0089] In an embodiment, the Tenderer 404 runs a rendering algorithm that intelligently uses the surround speakers in the theatre to the best of their ability. By improving the power handling and frequency response of the surround speakers, and keeping the same monitoring reference level for each output channel or speaker in the theatre, objects being panned between screen and surround speakers can maintain their sound pressure level and have a closer timbre match without, importantly, increasing the overall sound pressure level in the theatre. An array of appropriately-specified surround speakers will typically have sufficient headroom to reproduce the maximum dynamic range available within a surround 7.1 or 5.1 soundtrack (i.e. 20 dB above reference level), however it is unlikely that a single surround speaker will have the same headroom of a large multi-way screen speaker. As a result, there will likely be instances when an object placed in the surround field will require a sound pressure greater than that attainable using a single surround speaker. In these cases, the Tenderer will spread the sound across an appropriate number of speakers in order to achieve the required sound pressure level. The adaptive audio system improves the quality and power handling of surround speakers to provide an improvement in the faithfulness of the rendering. It provides support for bass management of the surround speakers through the use of optional rear subwoofers that allows each surround speaker to achieve improved power handling, and simultaneously potentially utilizing smaller speaker cabinets. It also allows the addition of side surround speakers closer to the screen than current practice to ensure that objects can smoothly transition from screen to surround.
[0090] Through the use of metadata to specify location information of audio objects along with certain rendering processes, system 400 provides a comprehensive, flexible method for content creators to move beyond the constraints of existing systems. As stated previously current systems create and distribute audio that is fixed to particular speaker locations with limited knowledge of the type of content conveyed in the audio essence (the part of the audio that is played back). The adaptive audio system 100 provides a new hybrid approach that includes the option for both speaker location specific audio (left channel, right channel, etc.) and object oriented audio elements that have generalized spatial information which may include, but are not limited to position, size and velocity. This hybrid approach provides a balanced approach for fidelity (provided by fixed speaker locations) and flexibility in rendering (generalized audio objects). The system also provides additional useful information about the audio content that is paired with the audio essence by the content creator at the time of content creation. This information provides powerful, detailed information on the attributes of the audio that can be used in very powerful ways during rendering. Such attributes may include, but are not limited to, content type (dialog, music, effect, Foley, back ground / ambience, etc.), spatial attributes (3D position, 3D size, velocity), and rendering information (snap to speaker location, channel weights, gain, bass management information, etc.).
[0091] The adaptive audio system described herein provides powerful information that can be used for rendering by a widely varying number of end points. In many cases the optimal rendering technique applied depends greatly on the end point device. For example, home theater systems and soundbars may have 2, 3, 5, 7 or even 9 separate speakers. Many other types of systems, such as televisions, computers, and music docks have only two speakers, and nearly all commonly used devices have a binaural headphone output (PC, laptop, tablet, cell phone, music player, etc.). However, for traditional audio that is distributed today (mono, stereo, 5.1, 7.1 channels) the end point devices often need to make simplistic decisions and compromises to render and reproduce audio that is now distributed in a channel/speaker specific form. In addition there is little or no information conveyed about the actual content that is being distributed (dialog, music, ambience, etc.) and little or no information about the content creator's intent for audio reproduction. However, the adaptive audio system 100 provides this information and, potentially, access to audio objects, which can be used to create a compelling next generation user experience.
[0092] The system 100 allows the content creator to embed the spatial intent of the mix within the bitstream using metadata such as position, size, velocity, and so on, through a unique and powerful metadata and adaptive audio transmission format. This allows a great deal of flexibility in the spatial reproduction of audio. From a spatial rendering standpoint, adaptive audio enables the adaptation of the mix to the exact position of the speakers in a particular room in order to avoid spatial distortion that occurs when the geometry of the playback system is not identical to the authoring system. In current audio reproduction systems where only audio for a speaker channel is sent, the intent of the content creator is unknown. System 100 uses metadata conveyed throughout the creation and distribution pipeline. An adaptive audio-aware reproduction system can use this metadata information to reproduce the content in a manner that matches the original intent of the content creator. Likewise, the mix can be adapted to the exact hardware configuration of the reproduction system. At present, there exist many different possible speaker configurations and types in rendering equipment such as television, home theaters, soundbars, portable music player docks, etc. When these systems are sent channel specific audio information today (i.e. left and right channel audio or multichannel audio) the system must process the audio to appropriately match the capabilities of the rendering equipment. An example is standard stereo audio being sent to a soundbar with more than two speakers. In current audio reproduction where only audio for a speaker channel is sent, the intent of the content creator is unknown. Through the use of metadata conveyed throughout the creation and distribution pipeline, an adaptive audio aware reproduction system can use this information to reproduce the content in a manner that matches the original intent of the content creator. For example, some soundbars have side firing speakers to create a sense of envelopment. With adaptive audio, spatial information and content type (such as ambient effects) can be used by the soundbar to send only the appropriate audio to these side firing speakers.
[0093] The adaptive audio system allows for unlimited interpolation of speakers in a system on all front/back, left/right, up/down, near/far dimensions. In current audio reproduction systems, no information exists for how to handle audio where it may be desired to position the audio such that it is perceived by a listener to be between two speakers. At present, with audio that is only assigned to a specific speaker, a spatial quantization factor is introduced. With adaptive audio, the spatial positioning of the audio can be known accurately and reproduced accordingly on the audio reproduction system.
[0094] With respect to headphone rendering, the creator' s intent is realized by matching Head Related Transfer Functions (HRTF) to the spatial position. When audio is reproduced over headphones, spatial virtualization can be achieved by the application of a Head Related Transfer Function, which processes the audio, adding perceptual cues that create the perception of the audio being played in 3D space and not over headphones. The accuracy of the spatial reproduction is dependent on the selection of the appropriate HRTF, which can vary based on several factors including the spatial position. Using the spatial information provided by the Adaptive Audio system can result in the selection of one or a continuing varying number of HRTFs to greatly improve the reproduction experience.
[0095] The spatial information conveyed by the adaptive audio system can be not only used by a content creator to create a compelling entertainment experience (film, television, music, etc.), but the spatial information can also indicate where a listener is positioned relative to physical objects such as buildings or geographic points of interest. This would allow the user to interact with a virtualized audio experience that is related to the real-world, i.e., augmented reality.
[0096] Embodiments also enable spatial upmixing, by performing enhanced upmixing by reading the metadata only if the objects audio data are not available. Knowing the position of all objects and their types allows the upmixer to better differentiate elements within the channel-based tracks. Existing upmixing algorithms have to infer information such as the audio content type (speech, music, ambient effects) as well as the position of different elements within the audio stream to create a high quality upmix with minimal or no audible artifacts. Many times the inferred information may be incorrect or inappropriate. With adaptive audio, the additional information available from the metadata related to, for example, audio content type, spatial position, velocity, audio object size, etc., can be used by an upmixing algorithm to create a high quality reproduction result. The system also spatially matches the audio to the video by accurately positioning the audio object of the screen to visual elements. In this case, a compelling audio/video reproduction experience is possible, particularly with larger screen sizes, if the reproduced spatial location of some audio elements match image elements on the screen. An example is having the dialog in a film or television program spatially coincide with a person or character that is speaking on the screen. With normal speaker channel based audio there is no easy method to determine where the dialog should be spatially positioned to match the location of the person or character on-screen. With the audio information available with adaptive audio, such audio/visual alignment can be achieved. The visual positional and audio spatial alignment can also be used for non- character/dialog objects such as cars, trucks, animation, and so on.
[0097] A spatial masking processing is facilitated by system 100, since knowledge of the spatial intent of a mix through the adaptive audio metadata means that the mix can be adapted to any speaker configuration. However, one runs the risk of downmixing objects in the same or almost the same location because of the playback system limitations. For example, an object meant to be panned in the left rear might be downmixed to the left front if surround channels are not present, but if a louder element occurs in the left front at the same time, the downmixed object will be masked and disappear from the mix. Using adaptive audio metadata, spatial masking may be anticipated by the renderer, and the spatial and or loudness downmix parameters of each object may be adjusted so all audio elements of the mix remain just as perceptible as in the original mix. Because the renderer understands the spatial relationship between the mix and the playback system, it has the ability to "snap" objects to the closest speakers instead of creating a phantom image between two or more speakers. While this may slightly distort the spatial representation of the mix, it also allows the renderer to avoid an unintended phantom image. For example, if the angular position of the mixing stage's left speaker does not correspond to the angular position of the playback system's left speaker, using the snap to closest speaker function could avoid having the playback system reproduce a constant phantom image of the mixing stage's left channel.
[0098] With respect to content processing, the adaptive audio system 100 allows the content creator to create individual audio objects and add information about the content that can be conveyed to the reproduction system. This allows a large amount of flexibility in the processing of audio prior to reproduction. From a content processing and rendering standpoint, the adaptive audio system enables processing to be adapted to the type of object. For example, dialog enhancement may be applied to dialog objects only. Dialog
enhancement refers to a method of processing audio that contains dialog such that the audibility and/or intelligibility of the dialog is increased and or improved. In many cases the audio processing that is applied to dialog is inappropriate for non-dialog audio content (i.e. music, ambient effects, etc.) and can result in objectionable audible artifacts. With adaptive audio, an audio object could contain only the dialog in a piece of content, and it can be labeled accordingly so that a rendering solution could selectively apply dialog enhancement to only the dialog content. In addition, if the audio object is only dialog (and not a mixture of dialog and other content which is often the case), then the dialog enhancement processing can process dialog exclusively (thereby limiting any processing being performed on any other content). Likewise, bass management (filtering, attenuation, gain) can be targeted at specific objects based on their type. Bass management refers to selectively isolating and processing only the bass (or lower) frequencies in a particular piece of content. With current audio systems and delivery mechanisms this is a "blind" process that is applied to all of the audio. With adaptive audio, specific audio objects for which bass management is appropriate can be identified by the metadata, and the rendering processing can be applied appropriately.
[0099] The adaptive audio system 100 also provides for object based dynamic range compression and selective upmixing. Traditional audio tracks have the same duration as the content itself, while an audio object might occur for only a limited amount of time in the content. The metadata associated with an object can contain information about its average and peak signal amplitude, as well as its onset or attack time (particularly for transient material). This information would allow a compressor to better adapt its compression and time constants (attack, release, etc.) to better suit the content. For selective upmixing, content creators might choose to indicate in the adaptive audio bitstream whether an object should be upmixed or not. This information allows the Adaptive Audio Tenderer and upmixer to distinguish which audio elements can be safely upmixed, while respecting the creator' s intent.
[00100] Embodiments also allow the adaptive audio system to select a preferred rendering algorithm from a number of available rendering algorithms and/or surround sound formats. Examples of available rendering algorithms include: binaural, stereo dipole, Ambisonics, Wave Field Synthesis (WFS), multi-channel panning, raw stems with position metadata. Others include dual balance, and vector-based amplitude panning.
[00101] The binaural distribution format uses a two-channel representation of a sound field in terms of the signal present at the left and right ears. Binaural information can be created via in-ear recording or synthesized using HRTF models. Playback of a binaural representation is typically done over headphones, or by employing cross-talk cancellation. Playback over an arbitrary speaker set-up would require signal analysis to determine the associated sound field and /or signal source(s).
[00102] The stereo dipole rendering method is a transaural cross-talk cancellation process to make binaural signals playable over stereo speakers (e.g., at + and - 10 degrees off center). [00103] Ambisonics is a (distribution format and a rendering method) that is encoded in a four channel form called B-format. The first channel, W, is the non-directional pressure signal; the second channel, X, is the directional pressure gradient containing the front and back information; the third channel, Y, contains the left and right, and the Z the up and down. These channels define a first order sample of the complete soundfield at a point. Ambisonics uses all available speakers to recreate the sampled (or synthesized) soundfield within the speaker array such that when some speakers are pushing, others are pulling.
[00104] Wave Field Synthesis is a rendering method of sound reproduction, based on the precise construction of the desired wave field by secondary sources. WFS is based on Huygens' principle, and is implemented as speaker arrays (tens or hundreds) that ring the listening space and operate in a coordinated, phased fashion to re-create each individual sound wave.
[00105] Multi-channel panning is a distribution format and/or rendering method, and may be referred to as channel-based audio. In this case, sound is represented as a number of discrete sources to be played back through an equal number of speakers at defined angles from the listener. The content creator / mixer can create virtual images by panning signals between adjacent channels to provide direction cues; early reflections, reverb, etc., can be mixed into many channels to provide direction and environmental cues.
[00106] Raw stems with position metadata is a distribution format, and may also be referred to as object-based audio. In this format, distinct, "close mic'ed," sound sources are represented along with position and environmental metadata. Virtual sources are rendered based on the metadata and playback equipment and listening environment.
[00107] The adaptive audio format is a hybrid of the multi-channel panning format and the raw stems format. The rendering method in a present embodiment is multi-channel panning. For the audio channels, the rendering (panning) happens at authoring time, while for objects the rendering (panning) happens at playback.
Metadata and Adaptive Audio Transmission Format
[00108] As stated above, metadata is generated during the creation stage to encode certain positional information for the audio objects and to accompany an audio program to aid in rendering the audio program, and in particular, to describe the audio program in a way that enables rendering the audio program on a wide variety of playback equipment and playback environments. The metadata is generated for a given program and the editors and mixers that create, collect, edit and manipulate the audio during post-production. An important feature of the adaptive audio format is the ability to control how the audio will translate to playback systems and environments that differ from the mix environment. In particular, a given cinema may have lesser capabilities than the mix environment.
[00109] The adaptive audio Tenderer is designed to make the best use of the equipment available to re-create the mixer's intent. Further, the adaptive audio authoring tools allow the mixer to preview and adjust how the mix will be rendered on a variety of playback configurations. All of the metadata values can be conditioned on the playback environment and speaker configuration. For example, a different mix level for a given audio element can be specified based on the playback configuration or mode. In an embodiment, the list of conditioned playback modes is extensible and includes the following: (1) channel-based only playback: 5.1, 7.1, 7.1 (height), 9.1 ; and (2) discrete speaker playback: 3D, 2D (no height).
[00110] In an embodiment, the metadata controls or dictates different aspects of the adaptive audio content and is organized based on different types including: program metadata, audio metadata, and rendering metadata (for channel and object). Each type of metadata includes one or more metadata items that provide values for characteristics that are referenced by an identifier (ID). FIG. 5 is a table that lists the metadata types and associated metadata elements for the adaptive audio system, under an embodiment.
[00111] As shown in table 500 of FIG. 5, the first type of metadata is program metadata, which includes metadata elements that specify the frame rate, track count, extensible channel description, and mix stage description. The frame rate metadata element specifies the rate of the audio content frames in units of frames per second (fps). The raw audio format need not include framing of the audio or metadata since the audio is provided as full tracks (duration of a reel or entire feature) rather than audio segments (duration of an object). The raw format does need to carry all the information required to enable the adaptive audio encoder to frame the audio and metadata, including the actual frame rate. Table 1 shows the ID, example values and description of the frame rate metadata element.
TABLE 1
ID Values Description 2
FrameRate 24,25,30,48,50,60, 96, 100, 120, Indication of intended frame rate
extensible (frames/sec) for entire program. Field shall
provide efficient coding of
common rates, as well as ability to extend to extensible floating
point field with 0.01 resolution.
[00112] The track count metadata element indicates the number of audio tracks in a frame. An example adaptive audio decoder/processor can support up to 128 simultaneous audio tracks, while the adaptive audio format will support any number of audio tracks. Table 2 shows the ID, example values and description of the track count metadata element.
TABLE 2
ID Values Description 2
nTracks Positive integer, extensible Indication of number of audio
range. tracks in the frame. [00113] Channel-based audio can be assigned to non-standard channels and the extensible channel description metadata element enables mixes to use new channel positions. For each extension channel the following metadata shall be provided as shown in Table 3 :
TABLE 3
ID Values Description 2
Figure imgf000033_0001
[00114] The mix stage description metadata element specifies the frequency at which a particular speaker produces half the power of the passband. Table 4 shows the ID, example values and description of the mix stage description metadata element, where LF = Low Frequency; HF = High Frequency; 3dB point = edge of speaker passband.
TABLE 4
ID Values Description
Figure imgf000033_0002
MixSpeakerSub List of (Gain, Speaker number) Speaker -> sub mapping. Used pairs. Gain is real value: to indicate target subwoofer for
0<=Gain<=1.0. bass management of each-
Speaker number is an integer. speaker. Each speaker can be
0 < Speaker number < bass managed to more than one nMixSpeakers- 1 sub. Gain indicates portion of
bass signal that should go to each sub. Gain=0 indicates end of list, and a Speaker number does not follow. If a speaker is not bass managed, first Gain value is set to O.
MixPos x,y,z coordinates for mix Nominal mix position
position
MixRoomDim x,y,z for room dimensions Nominal mix stage dimensions
(meters)
MixRoomRT60 Real value < 20. Nominal mix stage RT60
MixScreenDim x,y,z for screen dimensions
(meters)
MixScreenPos x,y,z for screen center (meters)
[00115] As shown in FIG. 5, the second type of metadata is audio metadata. Each channel- based or object-based audio element consists of audio essence and metadata. The audio essence is a monophonic audio stream carried on one of many audio tracks. The associated metadata describes how the audio essence is stored (audio metadata, e.g., sample rate) or how it should be rendered (rendering metadata, e.g., desired audio source position). In general, the audio tracks are continuous through the duration of the audio program. The program editor or mixer is responsible for assigning audio elements to tracks. The track use is expected to be sparse, i.e. median simultaneous track use may be only 16 to 32. In a typical implementation, the audio will be efficiently transmitted using a lossless encoder. However, alternate implementations are possible, for instance transmitting uncoded audio data or lossily coded audio data. In a typical implementation, the format consists of up to 128 audio tracks where each track has a single sample rate and a single coding system. Each track lasts the duration of the feature (no explicit reel support). The mapping of objects to tracks (time multiplexing) is the responsibility of the content creator (mixer).
[00116] As shown in FIG. 3, the audio metadata includes the elements of sample rate, bit depth, and coding systems. Table 5 shows the ID, example values and description of the sample rate metadata element. TABLE 5
ID Values Description
SampleRate 16, 24, 32, 44.1, 48, 88.2 96, and SampleRate field shall provide
extensible (xlOOO samples/sec) efficient coding of common
rates, as well as ability to extend to extensible floating point field with 0.01 resolution
[00117] Table 6 shows the ID, example values and description of the bit depth metad element (for PCM and lossless compression).
TABLE 6
ID Values Description
BitDepth Positive integer up to 32 Indication of sample bit depth.
Samples shall be left justified if bit depth is smaller than the
container (i.e. zero-fill LSBs)
[00118] Table 7 shows the ID, example values and description of the coding system metadata element.
TABLE 7
ID Value Description
Codec PCM, Lossless, extensible Indication of audio format. Each
audio track can be assigned any supported coding type
STAGE 1 STAGE 2
GroupNumber Positive integer Object grouping information.
Applies to Audio Objects and
Channel Objects, e.g. to indicate stems.
AudioTyp {dialog, music, effects, m&e, Audio type. List shall be
undef, other} extensible and include the
following: Undefined, Dialog,
Music, Effects, Foley,
Ambience, Other.
AudioTypTxt Free text description
[00119] As shown in FIG. 5, the third type of metadata is rendering metadata. The rendering metadata specifies values that help the Tenderer to match as closely as possible the original mixer intent regardless of the playback environment. The set of metadata elements are different for channel-based audio and object-based audio. A first rendering metadata field selects between the two types of audio - channel-based or object-based, as shown in Table 8. TABLE 8
ID Value STAGE 2
Figure imgf000036_0001
[00120] The rendering metadata for the channel-based audio comprises a position metadata element that specifies the audio source position as one or more speaker positions. Table 9 shows the ID and values for the position metadata element for the channel-based case.
TABLE 9
ID Values Description
ChannelPos {L, C, R, Ls, Rs, Lss, Rss, Lrs, Audio source position is
Rrs, Lts, Rts, Lc, Rc, Crs, Cts, indicated as one of a set of
other} named speaker positions. Set is
extensible. Position and extent of extension channel(s) is provided by ExtChanPos, and
ExtChanWidth.
[00121] The rendering metadata for the channel-based audio also comprises a rendering control element that specifies certain characteristics with regard to playback of channel-based audio, as shown in Table 10.
TABLE 10
ID Values Description
Figure imgf000036_0002
[00122] For object-based audio, the metadata includes analogous elements as for the channel-based audio. Table 11 provides the ID and values for the object position metadata element. Object position is described in one of three ways: three-dimensional co-ordinates; a plane and two-dimensional co-ordinates; or a line and a one-dimensional co-ordinate. The rendering method can adapt based on the position information type.
TABLE 11
Figure imgf000037_0001
[00123] The ID and values for the object rendering control metadata elements are shown in Table 12. These values provide additional means to control or optimize rendering for object- based audio.
TABLE 12
ID Values Description
Figure imgf000037_0002
ID Values Description
Figure imgf000038_0001
ID Value Description
Obj Rend Alg {def, dualBallance, vbap, dbap, Def: Tenderer's choice
2D, ID, other} dualBallance: Dolby method vbap: Vector-based amplitude panning
dbap: distance based amplitude panning
2D: in conjunction with ObjPos2D. use vbap with only 3 (virtual) source positions.
ID: in conjunction with ObjPoslD, use pair-wise pan between 2 (virtual) source positions.
ID Value Description
Obj Zones Positive real values <=1 Degree of contribution of any named speaker zone. Supported speaker zones include: L, C, R, Lss, Rss, Lrs, Rrs, Lts, Rts, Lc, Rc. Speaker zone list shall be extensible to support future zones.
Obj Level Positive real values <=2 Alternative Audio Object level for specific Channel
Configurations. Channel Configuration list shall be extensible and include 5.1 and Dolby Surround 7.1. Object may be attenuated or eliminated completely when rendering to smaller channel configurations.
ObjSSBias Indication of screen to room bias. Most useful for adjusting the default rendering of alternate
playback modes (5.1, 7.1).
Considered "optional" because this feature may not require
additional metadata - other
rendering data could be modified directly (e.g. pan trajectory,
downmix matrix).
[00124] In an embodiment, the metadata described above and illustrated in FIG. 5 is generated and stored as one or more files that are associated or indexed with corresponding audio content so that audio streams are processed by the adaptive audio system interpreting the metadata generated by the mixer. It should be noted that the metadata described above is an example set of ID's, values, and definitions, and other or additional metadata elements may be included for use in the adaptive audio system.
[00125] In an embodiment, two (or more) sets of metadata elements are associated with each of the channel and object based audio streams. A first set of metadata is applied to the plurality of audio streams for a first condition of the playback environment, and a second set of metadata is applied to the plurality of audio streams for a second condition of the playback environment. The second or subsequent set of metadata elements replaces the first set of metadata elements for a given audio stream based on the condition of the playback environment. The condition may include factors such as room size, shape, composition of material within the room, present occupancy and density of people in the room, ambient noise characteristics, ambient light characteristics, and any other factor that might affect the sound or even mood of the playback environment.
Post-Production and Mastering
[00126] The rendering stage 110 of the adaptive audio processing system 100 may include audio post-production steps that lead to the creation of a final mix. In a cinema application, the three main categories of sound used in a movie mix are dialogue, music, and effects.
Effects consist of sounds that are not dialogue or music (e.g., ambient noise,
background/scene noise). Sound effects can be recorded or synthesized by the sound designer or they can be sourced from effects libraries. A sub-group of effects that involve specific noise sources (e.g., footsteps, doors, etc.) are known as Foley and are performed by
Foley actors. The different types of sound are marked and panned accordingly by the recording engineers.
[00127] FIG. 6 illustrates an example workflow for a post-production process in an adaptive audio system, under an embodiment. As shown in diagram 600, all of the individual sound components of music, dialogue, Foley, and effects are brought together in the dubbing theatre during the final mix 606, and the re-recording mixer(s) 604 use the premixes (also known as the 'mix minus') along with the individual sound objects and positional data to create stems as a way of grouping, for example, dialogue, music, effects, Foley and background sounds. In addition to forming the final mix 606, the music and all effects stems can be used as a basis for creating dubbed language versions of the movie. Each stem consists of a channel-based bed and several audio objects with metadata. Stems combine to form the final mix. Using object panning information from both the audio workstation and the mixing console, the rendering and mastering unit 608 renders the audio to the speaker locations in the dubbing theatre. This rendering allows the mixers to hear how the channel- based beds and audio objects combine, and also provides the ability to render to different configurations. The mixer can use conditional metadata, which default to relevant profiles, to control how the content is rendered to surround channels. In this way, the mixers retain complete control of how the movie plays back in all the scalable environments. A monitoring step may be included after either or both of the re -recording step 604 and the final mix step 606 to allow the mixer to hear and evaluate the intermediate content generated during each of these stages.
[00128] During the mastering session, the stems, objects, and metadata are brought together in an adaptive audio package 614, which is produced by the printmaster 610. This package also contains the backward-compatible (legacy 5.1 or 7.1) surround sound theatrical mix 612. The rendering/mastering unit (RMU) 608 can render this output if desired; thereby eliminating the need for any additional workflow steps in generating existing channel-based deliverables. In an embodiment, the audio files are packaged using standard Material Exchange Format (MXF) wrapping. The adaptive audio mix master file can also be used to generate other deliverables, such as consumer multi-channel or stereo mixes. The intelligent profiles and conditional metadata allow controlled renderings that can significantly reduce the time required to create such mixes.
[00129] In an embodiment, a packaging system can be used to create a digital cinema package for the deliverables including an adaptive audio mix. The audio track files may be locked together to help prevent synchronization errors with the adaptive audio track files.
Certain territories require the addition of track files during the packaging phase, for instance, the addition of Hearing Impaired (HI) or Visually Impaired Narration (VI- N) tracks to the main audio track file. [00130] In an embodiment, the speaker array in the playback environment may comprise any number of surround-sound speakers placed and designated in accordance with established surround sound standards. Any number of additional speakers for accurate rendering of the object-based audio content may also be placed based on the condition of the playback environment. These additional speakers may be set up by a sound engineer, and this set up is provided to the system in the form of a set-up file that is used by the system for rendering the object-based components of the adaptive audio to a specific speaker or speakers within the overall speaker array. The set-up file includes at least a list of speaker designations and a mapping of channels to individual speakers, information regarding grouping of speakers, and a run-time mapping based on a relative position of speakers to the playback environment. The run-time mapping is utilized by a snap-to feature of the system that renders point source object-based audio content to a specific speaker that is nearest to the perceived location of the sound as intended by the sound engineer.
[00131] FIG. 7 is a diagram of an example workflow for a digital cinema packaging process using adaptive audio files, under an embodiment. As shown in diagram 700, the audio files comprising both the adaptive audio files and the 5.1 or 7.1 surround sound audio files are input to a wrapping/encryption block 704. In an embodiment, upon creation of the digital cinema package in block 706, the PCM MXF file (with appropriate additional tracks appended) is encrypted using SMPTE specifications in accordance with existing practice. The adaptive audio MXF is packaged as an auxiliary track file, and is optionally encrypted using a symmetric content key per the SMPTE specification. This single DCP 708 can then be delivered to any Digital Cinema Initiatives (DCI) compliant server. In general, any installations that are not suitably equipped will simply ignore the additional track file containing the adaptive audio soundtrack, and will use the existing main audio track file for standard playback. Installations equipped with appropriate adaptive audio processors will be able to ingest and replay the adaptive audio soundtrack where applicable, reverting to the standard audio track as necessary. The wrapping/encryption component 704 may also provide input directly to a distribution KDM block 710 for generating an appropriate security key for use in the digital cinema server. Other movie elements or files, such as subtitles 714 and images 716 may be wrapped and encrypted along with the audio files 702. In this case, certain processing steps may be included, such as compression 712 in the case of image files 716.
[00132] With respect to content management, the adaptive audio system 100 allows the content creator to create individual audio objects and add information about the content that can be conveyed to the reproduction system. This allows a great deal of flexibility in the content management of audio. From a content management standpoint, adaptive audio methods enable several different features. These include changing the language of content by only replacing the dialog object for space saving, download efficiency, geographical playback adaptation, etc. Film, television and other entertainment programs are typically distributed internationally. This often requires that the language in the piece of content be changed depending on where it will be reproduced (French for films being shown in France, German for TV programs being shown in Germany, etc.). Today this often requires a completely independent audio soundtrack to be created, packaged and distributed. With adaptive audio and its inherent concept of audio objects, the dialog for a piece of content could be an independent audio object. This allows the language of the content to be easily changed without updating or altering other elements of the audio soundtrack such as music, effects, etc. This would not only apply to foreign languages but also inappropriate language for certain audiences (e.g., children's television shows, airline movies, etc.), targeted advertising, and so on.
Installation and Equipment Considerations
[00133] The adaptive audio file format and associated processors allows for changes in how theatre equipment is installed, calibrated and maintained. With the introduction of many more potential speaker outputs, each individually equalized and balanced, there is a need for intelligent and time-efficient automatic room equalization, which may be performed through the ability to manually adjust any automated room equalization. In an embodiment, the adaptive audio system uses an optimized 1/12* octave band equalization engine. Up to 64 outputs can be processed to more accurately balance the sound in theatre. The system also allows scheduled monitoring of the individual speaker outputs, from cinema processor output right through to the sound reproduced in the auditorium. Local or network alerts can be created to ensure that appropriate action is taken. The flexible rendering system may automatically remove a damaged speaker or amplifier from the replay chain and render around it, so allowing the show to go on.
[00134] The cinema processor can be connected to the digital cinema server with existing 8xAES main audio connections, and an Ethernet connection for streaming adaptive audio data. Playback of surround 7.1 or 5.1 content uses the existing PCM connections. The adaptive audio data is streamed over Ethernet to the cinema processor for decoding and rendering, and communication between the server and the cinema processor allows the audio to be identified and synchronized. In the event of any issue with the adaptive audio track playback, sound is reverted back to the Dolby Surround 7.1 or 5.1 PCM audio.
[00135] Although embodiments have been described with regard to 5.1 and 7.1 surround sound systems, it should be noted that many other present and future surround configurations may be used in conjunction with embodiments including 9.1, 11.1 and 13.1 and beyond.
[00136] The adaptive audio system is designed to allow both content creators and exhibitors to decide how sound content is to be rendered in different playback speaker configurations. The ideal number of speaker output channels used will vary accord to room size. Recommended speaker placement is thus dependent on many factors, such as size, composition, seating configuration, environment, average audience sizes, and so on.
Example or representative speaker configurations and layouts are provided herein for purposes of illustration only, and are not intended to limit the scope of any claimed embodiments.
[00137] The recommended layout of speakers for an adaptive audio system remains compatible with existing cinema systems, which is vital so as not to compromise the playback of existing 5.1 and 7.1 channel-based formats. In order to preserve the intent of the adaptive audio sound engineer, and the intent of mixers of 7.1 and 5.1 content, the positions of existing screen channels should not be altered too radically in an effort to heighten or accentuate the introduction of new speaker locations. In contrast to using all 64 output channels available, the adaptive audio format is capable of being accurately rendered in the cinema to speaker configurations such as 7.1, so even allowing the format (and associated benefits) to be used in existing theatres with no change to amplifiers or speakers.
[00138] Different speaker locations can have different effectiveness depending on the theatre design, thus there is at present no industry-specified ideal number or placement of channels. The adaptive audio is intended to be truly adaptable and capable of accurate play back in a variety of auditoriums, whether they have a limited number of playback channels or many channels with highly flexible configurations.
[00139] FIG. 8 is an overhead view 800 of an example layout of suggested speaker locations for use with an adaptive audio system in a typical auditorium, and FIG. 9 is a front view 900 of the example layout of suggested speaker locations at the screen of the auditorium. The reference position referred to hereafter corresponds to a position 2/3 of the distance back from the screen to the rear wall, on the center line of the screen. Standard screen speakers 801 are shown in their usual positions relative to the screen. Studies of the perception of elevation in the screen plane have shown that additional speakers 804 behind the screen, such as Left Center (Lc) and Right Center (Rc) screen speakers (in the locations of Left Extra and Right Extra channels in 70 mm film formats), can be beneficial in creating smoother pans across the screen. Such optional speakers, particularly in auditoria with screens greater than 12 m (40 ft.) wide are thus recommended. All screen speakers should be angled such that they are aimed towards the reference position. The recommended placement of the subwoofer 810 behind the screen should remain unchanged, including maintaining asymmetric cabinet placement, with respect to the center of the room, to prevent stimulation of standing waves. Additional subwoofers 816 may be placed at the rear of the theatre.
[00140] Surround speakers 802 should be individually wired back to the amplifier rack, and be individually amplified where possible with a dedicated channel of power amplification matching the power handling of the speaker in accordance with the manufacturer' s specifications. Ideally, surround speakers should be specified to handle an increased SPL for each individual speaker, and also with wider frequency response where possible. As a rule of thumb for an average-sized theatre, the spacing of surround speakers should be between 2 and 3 m (6'6" and 9'9"), with left and right surround speakers placed symmetrically.
However, the spacing of surround speakers is most effectively considered as angles subtended from a given listener between adjacent speakers, as opposed to using absolute distances between speakers. For optimal playback throughout the auditorium, the angular distance between adjacent speakers should be 30 degrees or less, referenced from each of the four corners of the prime listening area. Good results can be achieved with spacing up to 50 degrees. For each surround zone, the speakers should maintain equal linear spacing adjacent to the seating area where possible. The linear spacing beyond the listening area, e.g. between the front row and the screen, can be slightly larger. FIG. 11 is an example of a positioning of top surround speakers 808 and side surround speakers 806 relative to the reference position, under an embodiment.
[00141] Additional side surround speakers 806 should be mounted closer to the screen than the currently recommended practice of starting approximately one-third of the distance to the back of the auditorium. These speakers are not used as side surrounds during playback of Dolby Surround 7.1 or 5.1 soundtracks, but will enable smooth transition and improved timbre matching when panning objects from the screen speakers to the surround zones. To maximize the impression of space, the surround arrays should be placed as low as practical, subject to the following constraints: the vertical placement of surround speakers at the front of the array should be reasonably close to the height of screen speaker acoustic center, and high enough to maintain good coverage across the seating area according to the directivity of the speaker. The vertical placement of the surround speakers should be such that they form a straight line from front to back, and (typically) slanted upward so the relative elevation of surround speakers above the listeners is maintained toward the back of the cinema as the seating elevation increases, as shown in FIG. 10, which is a side view of an example layout of suggested speaker locations for use with an adaptive audio system in the typical auditorium. In practice, this can be achieved most simply by choosing the elevation for the front-most and rear-most side surround speakers, and placing the remaining speakers in a line between these points.
[00142] In order to provide optimum coverage for each speaker over the seating area, the side surround 806 and rear speakers 816 and top surrounds 808 should be aimed towards the reference position in the theatre, under defined guidelines regarding spacing, position, angle, and so on.
[00143] Embodiments of the adaptive audio cinema system and format achieve improved levels of audience immersion and engagement over present systems by offering powerful new authoring tools to mixers, and a new cinema processor featuring a flexible rendering engine that optimizes the audio quality and surround effects of the soundtrack to each room's speaker layout and characteristics. In addition, the system maintains backwards compatibility and minimizes the impact on the current production and distribution workflows.
[00144] Although embodiments have been described with respect to examples and implementations in a cinema environment in which the adaptive audio content is associated with film content for use in digital cinema processing systems, it should be noted that embodiments may also be implemented in non-cinema environments. The adaptive audio content comprising object-based audio and channel-based audio may be used in conjunction with any related content (associated audio, video, graphic, etc.), or it may constitute standalone audio content. The playback environment may be any appropriate listening environment from headphones or near field monitors to small or large rooms, cars, open air arenas, concert halls, and so on.
[00145] Aspects of the system 100 may be implemented in an appropriate computer-based sound processing network environment for processing digital or digitized audio files.
Portions of the adaptive audio system may include one or more networks that comprise any desired number of individual machines, including one or more routers (not shown) that serve to buffer and route the data transmitted among the computers. Such a network may be built on various different network protocols, and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof. In an embodiment in which the network comprises the Internet, one or more machines may be configured to access the Internet through web browser programs.
[00146] One or more of the components, blocks, processes or other functional components may be implemented through a computer program that controls execution of a processor- based computing device of the system. It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer- readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non- volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.
[00147] Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise," "comprising," and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of "including, but not limited to." Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words "herein," "hereunder," "above," "below," and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word "or" is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
[00148] While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that one or more implementations are not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.

Claims

CLAIMS: What is claimed is:
1. A system for processing audio signals, comprising:
an authoring component configured to receive a plurality of audio signals, and to generate a plurality of monophonic audio streams and one or more metadata sets associated with each of the audio streams and specifying a playback location of a respective audio stream, wherein the audio streams are identified as either channel-based audio or object- based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space; and further wherein a first set of metadata is applied by default to one or more of the plurality of audio streams, and a second set of metadata is associated with a specific condition of a playback environment and is applied to the one or more of the plurality of audio streams instead of the first set if a condition of the playback environment matches the specific condition of the playback environment; and
a rendering system coupled to the authoring component and configured to receive a bitstream encapsulating the plurality of monophonic audio streams and the one or more data sets, and to render the audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the one or more metadata sets based on the condition of the playback environment.
2. The system of claim 1 wherein each metadata set includes metadata elements associated with each object-based stream, the metadata elements for each object-based stream specifying spatial parameters controlling the playback of a corresponding object-based sound, and comprising one or more of: sound position, sound width, and sound velocity; and further wherein each metadata set includes metadata elements associated with each channel-based stream, and the speaker array comprises speakers arranged in a defined surround sound configuration, and wherein the metadata elements associated with each channel-based stream comprises designations of surround-sound channels of the speakers in the speaker array in accordance with a defined surround-sound standard.
3. The system of claim 1 wherein the speaker array includes additional speakers for playback of object-based streams that are positioned in the playback environment in accordance with set up instructions from a user based on the condition of the playback environment, and wherein the playback condition depends on variables comprising: size and shape of a room of the playback environment, occupancy, material composition, and ambient noise; and further wherein the system receives a set-up file from the user that includes at least a list of speaker designations and a mapping of channels to individual speakers of the speaker array, information regarding grouping of speakers, and a run-time mapping based on a relative position of speakers to the playback environment.
4. The system of claim 1 wherein the authoring component includes a mixing console having controls operable by the user to specify playback levels of the audio streams comprising the original audio content, and wherein the metadata elements associated with each respective object-based stream are automatically generated upon input to the mixing console controls by the user.
5. The system of claim 1 wherein the metadata sets include metadata to enable upmixing or downmixing of at least one of the channel-based audio streams and the object-based audio streams in accordance with a change from a first configuration of the speaker array to a second configuration of the speaker array.
6. The system of claim 3 wherein the content type is selected from the group consisting of: dialog, music, and effects, and each content type is embodied in a respective set of channel-based streams or object-based streams, and further wherein sound components of each content type are transmitted to defined speaker groups of one or more speaker groups designated within the speaker array.
7. The system of claim 6 wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based stream specify that one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, as indicated by the position metadata.
8. The system of claim 1 wherein the playback location comprises a spatial position relative to a screen within the playback environment, or a surface that encloses the playback environment, and wherein the surface comprises a front plane, a back plane, a left plane, right plane, an upper plane, and a lower plane.
9. The system of claim 1 further comprising a codec coupled to the authoring component and the rendering component and configured to receive the plurality of audio streams and metadata and to generate a single digital bitstream containing the plurality of audio streams in an ordered fashion.
10. The system of claim 9 wherein the rendering component further comprises means for selecting a rendering algorithm utilized by the rendering component, the rendering algorithm selected from the group consisting of: binaural, stereo dipole, Ambisonics, Wave Field Synthesis (WFS), multi-channel panning, raw stems with position metadata, dual balance, and vector-based amplitude panning.
11. The system of claim 1 wherein the playback location for each of the audio streams is independently specified with respect to either an egocentric frame of reference or an allocentric frame of reference, wherein an egocentric frame of reference is taken in relation to a listener in the playback environment, and wherein the allocentric frame of reference is taken with respect to a characteristic of the playback environment.
12. A system for processing audio signals, comprising:
an authoring component configured to receive a plurality of audio signals and to generate a plurality of monophonic audio streams and metadata associated with each of the audio streams and specifying a playback location of a respective audio stream, wherein the audio streams are identified as either channel-based audio or object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space, and wherein each object-based audio stream is rendered in at least one specific speaker of the speaker array; and
a rendering system coupled to the authoring component and configured to receive a bitstream encapsulating the plurality of monophonic audio streams and metadata, and to render the audio streams to a plurality of speaker feeds corresponding to speakers in a playback environment, wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based stream specify that one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, such that an object-based stream effectively snaps to the speaker nearest to the intended playback location.
13. The system of claim 12 wherein the metadata comprises two or more metadata sets, and the rendering system renders the audio streams in accordance with one of the two or more metadata sets based on a condition of the playback environment, wherein a first set of metadata is applied to one or more of the plurality of audio streams for a first condition of the playback environment, and a second set of metadata is applied to the one or more plurality of audio streams for a second condition of the playback environment; and wherein each metadata set includes metadata elements associated with each object-based stream, the metadata elements for each object-based stream specifying spatial parameters controlling the playback of a corresponding object-based sound, and comprising one or more of: sound position, sound width, and sound velocity; and further wherein each metadata set includes metadata elements associated with each channel-based stream, and the speaker array comprises speakers arranged in a defined surround sound configuration, and wherein the metadata elements associated with each channel-based stream comprise designations of surround-sound channels of the speakers in the speaker array in accordance with a defined surround-sound standard.
14. The system of claim 12 wherein the speaker array includes additional speakers for playback of object-based streams that are positioned in the playback environment in accordance with set up instructions from a user based on a condition of the playback environment, and wherein the playback condition depends on variables comprising: size and shape of a room of the playback environment, occupancy, material composition, and ambient noise; and further wherein the system receives a set-up file from the user that includes at least a list of speaker designations and a mapping of channels to individual speakers of the speaker array, information regarding grouping of speakers, and a run-time mapping based on a relative position of speakers to the playback environment, and wherein an object stream rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component snaps to a single speaker of the additional speakers.
15. The system of claim 14 wherein the intended playback location comprises a spatial position relative to a screen within the playback environment or a surface that encloses the playback environment, and wherein the surface comprises a front plane, a back plane, a left plane, a top plane, and a floor plane.
16. A system for processing audio signals, comprising:
an authoring component configured to receive a plurality of audio signals and to generate a plurality of monophonic audio streams and metadata associated with each of the audio streams and specifying a playback location of a respective audio stream, wherein the audio streams are identified as either channel-based audio or object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array, and wherein each object based audio stream is rendered in at least one specific speaker of the speaker array; and
a rendering system coupled to the authoring component and configured to receive a first map of speakers to audio channels comprising a list of speakers and their respective locations within the playback environment and a bitstream encapsulating the plurality of monophonic audio streams and metadata, and to render the audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with a run-time mapping based on a relative position of speakers to the playback environment and a condition of the playback environment.
17. The system of claim 16 wherein the condition of the playback environment depends on variables comprising: size and shape of a room of the playback environment, occupancy, material composition, and ambient noise.
18. The system of claim 17 wherein the first map is specified in a set-up file that includes at least a list of speaker designations and a mapping of channels to individual speakers of the speaker array, and information regarding grouping of speakers.
19. The system of claim 18 wherein the intended playback location comprises a spatial position relative to a screen within the playback environment or a surface of an enclosure containing the playback environment, and wherein the surface comprises one of: a front plane, a back plane, a side plane, a top plane, and a floor plane of the enclosure.
20. The system of claim 19 wherein the speaker array comprises speakers arranged in a defined surround sound configuration, and wherein the metadata elements associated with each channel-based stream comprise designations of surround-sound channels of the speakers in the speaker array in accordance with a defined surround-sound standard, and further wherein certain object-based streams are played through additional speakers of the speaker array, and wherein the run-time mapping dynamically determines which individual speakers of the speaker array play back a corresponding object-based stream during a playback process.
21. A method of authoring audio signals for rendering, comprising:
receiving a plurality of audio signals;
generating a plurality of monophonic audio streams and one or more metadata sets associated with each of the audio streams and specifying a playback location of a respective audio stream, wherein the audio streams are identified as either channel-based audio or object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object- based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array; and further wherein a first set of metadata is applied to one or more of the plurality of audio streams for a first condition of the playback environment, and a second set of metadata is applied to the one or more of plurality of audio streams for a second condition of the playback environment; and
encapsulating the plurality of monophonic audio streams and the one or more metadata sets in a bitstream for transmission to a rendering system configured to render the audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the one or more metadata sets based on a condition of the playback environment.
22. The method of claim 21 wherein each metadata set includes metadata elements associated with each object-based stream, the metadata elements for each object-based stream specifying spatial parameters controlling the playback of a corresponding object-based sound, and comprising one or more of: sound position, sound width, and sound velocity; and further wherein each metadata set includes metadata elements associated with each channel-based stream, and the speaker array comprises speakers arranged in a defined surround sound configuration, and wherein the metadata elements associated with each channel-based stream comprises designations of surround-sound channels of the speakers in the speaker array in accordance with a defined surround-sound standard.
23. The method of claim 21 wherein the speaker array includes additional speakers for playback of object-based streams that are positioned in the playback environment, the method further comprising receiving set up instructions from a user based on the condition of the playback environment, and wherein the playback condition depends on variables comprising: size and shape of a room of the playback environment, occupancy, material composition, and ambient noise; the setup instructions further including at least a list of speaker designations and a mapping of channels to individual speakers of the speaker array, information regarding grouping of speakers, and a run-time mapping based on a relative position of speakers to the playback environment.
24. The method of claim 23 further comprising:
receiving, from a mixing console having controls operated by a user to specify playback levels of the audio streams comprising the original audio content; and
automatically generating the metadata elements associated with each respective object-based stream generated upon receipt of the user input.
25. A method of rendering audio signals, comprising:
receiving a bitstream encapsulating a plurality of monophonic audio streams and one or more metadata sets in a bitstream from an authoring component configured to receive a plurality of audio signals, and generate a plurality of monophonic audio streams and one or more metadata sets associated with each of the audio streams and specifying a playback location of a respective audio stream, wherein the audio streams are identified as either channel-based audio or object-based audio, and wherein the playback location of the channel- based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array; and further wherein a first set of metadata is applied to one or more of the plurality of audio streams for a first condition of the playback environment, and a second set of metadata is applied to the one or more plurality of audio streams for a second condition of the playback environment; and
rendering the plurality of audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the one or more metadata sets based on a condition of the playback environment.
26. A method of creating audio content comprising a plurality of monophonic audio streams processed in an authoring component, wherein the monophonic audio streams comprise at least one channel-based audio stream and at least one object-based audio stream, the method comprising:
indicating whether each audio stream of the plurality of audio streams is a channel- based stream or an object-based stream;
associating with each channel-based stream a metadata element specifying a channel position for rendering the respective channel-based stream to one or more speakers within a playback environment;
associating with each object-based stream one or metadata elements specifying an object-based position for rendering the respective object-based stream to one or more speakers within the playback environment with respect to an allocentric frame of reference defined with respect to size and dimensions of the playback environment; and
assembling the plurality of monophonic streams and associated metadata into a signal.
27. The method of claim 26 wherein the playback environment includes an array of speakers placed at defined locations and orientations relative to a reference point of an enclosure embodying the playback environment.
28. The method of claim 27 wherein a first set of speakers of the array of speakers comprises speakers arranged according to a defined surround sound system, and wherein a second set of speakers of the array of speakers comprises speakers arranged according to an adaptive audio scheme.
29. The method of claim 28 further comprising:
defining an audio type for sets of the plurality of monophonic audio streams, wherein the audio type is selected from the group consisting of dialog, music, and effects; and transmitting the sets of audio streams to specific sets of speakers based on the audio type for a respective set of audio streams.
30. The method of claim 29 further comprising automatically generating the metadata elements through an authoring component implemented in a mixing console having controls operable by a user to specify playback levels of the monophonic audio streams.
31. The method of claim 30 further comprising packaging the plurality of monophonic audio streams and associated metadata elements into a single digital bitstream within an encoder.
32. A method of creating audio content comprising:
determining values of one or more metadata elements in a first metadata group associated with programming of the audio content for processing in a hybrid audio system handling both channel-based and object-based audio content;
determining values of one or more metadata elements in a second metadata group associated with storage and rendering characteristics of the audio content in the hybrid audio system; and
determining values of one or more metadata elements in a third metadata group associated with audio source position and control information for rendering the channel- based and object-based audio content.
33. The method of claim 32 wherein the audio source position for rendering the channel- based audio content comprises names associated with speakers in a surround sound speaker system, wherein the names define a location of respective speakers relative to one or more a reference positions in the playback environment.
34. The method of claim 33 wherein the control information for rendering the channel- based audio content comprises upmix and downmix information for rendering audio content in different surround sound configurations, and wherein the metadata includes metadata for enabling or disabling an upmix and/or downmix function.
35. The method of claim 32 wherein the audio source position for rendering the object- based audio content comprises values associated with one or more mathematical functions specifying an intended playback location for playback of a sound component of the object based audio content.
36. The method of claim 35 wherein the mathematical functions are selected from the group consisting of: three-dimensional coordinates specified as x, y, z coordinate values, a surface definition plus a set of two-dimensional coordinates, and a curve definition plus a one-dimensional linear position coordinate, and a scalar position on a screen in the playback environment.
37. The method of claim 36 wherein the control information for rendering the object- based audio content comprises values specifying individual speakers or speaker groups within the playback environment through which the sound component is played.
38. The method of claim 37 wherein the control information for rendering the object- based audio content further comprises a binary value specifying the sound source to be snapped to a nearest speaker or nearest speaker group within the playback environment.
39. A method of defining an audio transport protocol comprising:
defining values of one or more metadata elements in a first metadata group associated with programming of the audio content for processing in a hybrid audio system handling both channel-based and object-based audio content;
defining values of one or more metadata elements in a second metadata group associated with storage and rendering characteristics of the audio content in the hybrid audio system; and
defining values of one or more metadata elements in a third metadata group associated with audio source position and control information for rendering the channel- based and object-based audio content.
PCT/US2012/044388 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering WO2013006338A2 (en)

Priority Applications (49)

Application Number Priority Date Filing Date Title
KR1020197003234A KR102003191B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
KR1020237041109A KR20230170110A (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
BR122020001361-3A BR122020001361B1 (en) 2011-07-01 2012-06-27 System for processing audio signals, system for processing audio signals, and method for rendering audio signals
IL302167A IL302167A (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
CN201280032058.3A CN103650539B (en) 2011-07-01 2012-06-27 The system and method for produce for adaptive audio signal, encoding and presenting
IL295733A IL295733B2 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
KR1020197020510A KR102115723B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
US14/130,386 US9179236B2 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
KR1020147037035A KR101845226B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
MX2013014684A MX2013014684A (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering.
KR1020207034194A KR102406776B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
AU2012279357A AU2012279357B2 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
IL291043A IL291043B2 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
KR1020137034894A KR101685447B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
JP2014518958A JP5912179B2 (en) 2011-07-01 2012-06-27 Systems and methods for adaptive audio signal generation, coding, and rendering
UAA201400839A UA114793C2 (en) 2012-04-20 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
PL12743261T PL2727383T3 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
ES12743261T ES2871224T3 (en) 2011-07-01 2012-06-27 System and method for the generation, coding and computer interpretation (or rendering) of adaptive audio signals
DK12743261.5T DK2727383T3 (en) 2011-07-01 2012-06-27 SYSTEM AND METHOD OF ADAPTIVE AUDIO SIGNAL GENERATION, CODING AND PLAYBACK
BR112013033386-3A BR112013033386B1 (en) 2011-07-01 2012-06-27 system and method for adaptive audio signal generation, encoding, and rendering
EP21169907.9A EP3893521A1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
CA2837893A CA2837893C (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
EP12743261.5A EP2727383B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
KR1020187008804A KR101946795B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
RU2013158054A RU2617553C2 (en) 2011-07-01 2012-06-27 System and method for generating, coding and presenting adaptive sound signal data
KR1020207014372A KR102185941B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
KR1020227018617A KR102608968B1 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
IL230046A IL230046A (en) 2011-07-01 2013-12-19 System and method for adaptive audio signal generation, coding and rendering
US14/866,350 US9467791B2 (en) 2011-07-01 2015-09-25 System and method for adaptive audio signal generation, coding and rendering
AU2016202227A AU2016202227B2 (en) 2011-07-01 2016-04-11 System and Method for Adaptive Audio Signal Generation, Coding and Rendering
IL245574A IL245574A0 (en) 2011-07-01 2016-05-10 System and method for adaptive audio signal generation, coding and rendering
US15/263,279 US9622009B2 (en) 2011-07-01 2016-09-12 System and method for adaptive audio signal generation, coding and rendering
US15/483,806 US9800991B2 (en) 2011-07-01 2017-04-10 System and method for adaptive audio signal generation, coding and rendering
US15/672,656 US9942688B2 (en) 2011-07-01 2017-08-09 System and method for adaptive audio signal generation, coding and rendering
US15/905,536 US10057708B2 (en) 2011-07-01 2018-02-26 System and method for adaptive audio signal generation, coding and rendering
AU2018203734A AU2018203734B2 (en) 2011-07-01 2018-05-28 System and Method for Adaptive Audio Signal Generation, Coding and Rendering
US16/035,262 US10165387B2 (en) 2011-07-01 2018-07-13 System and method for adaptive audio signal generation, coding and rendering
US16/207,006 US10327092B2 (en) 2011-07-01 2018-11-30 System and method for adaptive audio signal generation, coding and rendering
IL265741A IL265741B (en) 2011-07-01 2019-04-01 System and method for adaptive audio signal generation, coding and rendering
AU2019204012A AU2019204012B2 (en) 2011-07-01 2019-06-07 System and method for adaptive audio signal generation, coding and rendering
US16/443,268 US10477339B2 (en) 2011-07-01 2019-06-17 System and method for adaptive audio signal generation, coding and rendering
US16/679,945 US10904692B2 (en) 2011-07-01 2019-11-11 System and method for adaptive audio signal generation, coding and rendering
AU2020226984A AU2020226984B2 (en) 2011-07-01 2020-08-31 System and method for adaptive audio signal generation, coding and rendering
IL277736A IL277736B (en) 2011-07-01 2020-10-01 System and method for adaptive audio signal generation, coding and rendering
US17/156,459 US11412342B2 (en) 2011-07-01 2021-01-22 System and method for adaptive audio signal generation, coding and rendering
IL284585A IL284585B (en) 2011-07-01 2021-07-04 System and method for adaptive audio signal generation, coding and rendering
AU2021258043A AU2021258043B2 (en) 2011-07-01 2021-10-28 System and method for adaptive audio signal generation, coding and rendering
US17/883,440 US11962997B2 (en) 2022-08-08 System and method for adaptive audio signal generation, coding and rendering
AU2023200502A AU2023200502A1 (en) 2011-07-01 2023-01-31 System and method for adaptive audio signal generation, coding and rendering

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201161504005P 2011-07-01 2011-07-01
US61/504,005 2011-07-01
US201261636429P 2012-04-20 2012-04-20
US61/636,429 2012-04-20

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US14/130,386 A-371-Of-International US9179236B2 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering
US14/866,350 Continuation US9467791B2 (en) 2011-07-01 2015-09-25 System and method for adaptive audio signal generation, coding and rendering

Publications (2)

Publication Number Publication Date
WO2013006338A2 true WO2013006338A2 (en) 2013-01-10
WO2013006338A3 WO2013006338A3 (en) 2013-10-10

Family

ID=46604526

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/044388 WO2013006338A2 (en) 2011-07-01 2012-06-27 System and method for adaptive audio signal generation, coding and rendering

Country Status (22)

Country Link
US (11) US9179236B2 (en)
EP (2) EP2727383B1 (en)
JP (11) JP5912179B2 (en)
KR (9) KR102115723B1 (en)
CN (2) CN103650539B (en)
AR (1) AR086775A1 (en)
AU (7) AU2012279357B2 (en)
BR (2) BR112013033386B1 (en)
CA (3) CA3157717A1 (en)
DK (1) DK2727383T3 (en)
ES (1) ES2871224T3 (en)
HK (1) HK1219604A1 (en)
HU (1) HUE054452T2 (en)
IL (8) IL291043B2 (en)
MX (1) MX2013014684A (en)
MY (1) MY165933A (en)
PL (1) PL2727383T3 (en)
RU (3) RU2731025C2 (en)
SG (1) SG10201604679UA (en)
TW (6) TWI651005B (en)
UA (1) UA124570C2 (en)
WO (1) WO2013006338A2 (en)

Cited By (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011025437A1 (en) 2009-08-26 2011-03-03 Svenska Utvecklings Entreprenören Susen Ab Method for wakening up a driver of a motor vehicle
WO2013192111A1 (en) 2012-06-19 2013-12-27 Dolby Laboratories Licensing Corporation Rendering and playback of spatial audio using channel-based audio systems
WO2014036121A1 (en) 2012-08-31 2014-03-06 Dolby Laboratories Licensing Corporation System for rendering and playback of object based audio in various listening environments
KR101410976B1 (en) 2013-05-31 2014-06-23 한국산업은행 Apparatus and method for positioning of speaker
KR20140092779A (en) * 2013-01-15 2014-07-24 한국전자통신연구원 Encoding/decoding apparatus and method for controlling multichannel signals
WO2014112793A1 (en) * 2013-01-15 2014-07-24 한국전자통신연구원 Encoding/decoding apparatus for processing channel signal and method therefor
KR20140093578A (en) * 2013-01-15 2014-07-28 한국전자통신연구원 Audio signal procsessing apparatus and method for sound bar
WO2014124261A1 (en) * 2013-02-08 2014-08-14 Qualcomm Incorporated Signaling audio rendering information in a bitstream
WO2014151092A1 (en) 2013-03-15 2014-09-25 Dts, Inc. Automatic multi-channel music mix from multiple audio stems
WO2014160717A1 (en) * 2013-03-28 2014-10-02 Dolby Laboratories Licensing Corporation Using single bitstream to produce tailored audio device mixes
WO2014159272A1 (en) * 2013-03-28 2014-10-02 Dolby Laboratories Licensing Corporation Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
WO2014163657A1 (en) * 2013-04-05 2014-10-09 Thomson Licensing Method for managing reverberant field for immersive audio
JP2014204317A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204316A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204322A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204320A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204321A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204323A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
WO2014177202A1 (en) * 2013-04-30 2014-11-06 Huawei Technologies Co., Ltd. Audio signal processing apparatus
WO2014184353A1 (en) 2013-05-16 2014-11-20 Koninklijke Philips N.V. An audio processing apparatus and method therefor
WO2014184706A1 (en) * 2013-05-16 2014-11-20 Koninklijke Philips N.V. An audio apparatus and method therefor
CN104240711A (en) * 2013-06-18 2014-12-24 杜比实验室特许公司 Self-adaptive audio frequency content generation
WO2014204911A1 (en) * 2013-06-18 2014-12-24 Dolby Laboratories Licensing Corporation Bass management for audio rendering
WO2014209902A1 (en) * 2013-06-28 2014-12-31 Dolby Laboratories Licensing Corporation Improved rendering of audio objects using discontinuous rendering-matrix updates
WO2015006112A1 (en) 2013-07-08 2015-01-15 Dolby Laboratories Licensing Corporation Processing of time-varying metadata for lossless resampling
EP2830047A1 (en) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for low delay object metadata coding
WO2015017235A1 (en) * 2013-07-31 2015-02-05 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
EP2830332A3 (en) * 2013-07-22 2015-03-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method, signal processing unit, and computer program for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
WO2015066062A1 (en) * 2013-10-31 2015-05-07 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
WO2015081293A1 (en) * 2013-11-27 2015-06-04 Dts, Inc. Multiplet-based matrix mixing for high-channel count multichannel audio
EP2892250A1 (en) * 2014-01-07 2015-07-08 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating a plurality of audio channels
WO2015144766A1 (en) * 2014-03-26 2015-10-01 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for screen related audio object remapping
EP2930952A1 (en) * 2012-12-04 2015-10-14 Samsung Electronics Co., Ltd. Audio providing apparatus and audio providing method
WO2015126814A3 (en) * 2014-02-20 2015-10-15 Bose Corporation Content-aware audio modes
JP2015195545A (en) * 2014-03-25 2015-11-05 日本放送協会 Channel number converter
BE1022233B1 (en) * 2013-09-27 2016-03-03 James A Cashin SECURE SYSTEM AND METHOD FOR PROCESSING AUDIO SOUND
JP2016507088A (en) * 2013-06-19 2016-03-07 ドルビー ラボラトリーズ ライセンシング コーポレイション Audio encoder and decoder with program information or substream structure metadata
WO2016050900A1 (en) * 2014-10-03 2016-04-07 Dolby International Ab Smart access to personalized audio
JP2016510905A (en) * 2013-03-01 2016-04-11 クゥアルコム・インコーポレイテッドQualcomm Incorporated Specify spherical harmonics and / or higher order ambisonics coefficients in bitstream
US9338573B2 (en) 2013-07-30 2016-05-10 Dts, Inc. Matrix decoder with constant-power pairwise panning
US20160133267A1 (en) * 2013-07-22 2016-05-12 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects
JP2016519788A (en) * 2013-04-03 2016-07-07 ドルビー ラボラトリーズ ライセンシング コーポレイション Method and system for interactive rendering of object-based audio
US9483228B2 (en) 2013-08-26 2016-11-01 Dolby Laboratories Licensing Corporation Live engine
JP6022685B2 (en) * 2013-06-10 2016-11-09 株式会社ソシオネクスト Audio playback apparatus and method
TWI560699B (en) * 2013-07-22 2016-12-01 Fraunhofer Ges Forschung Apparatus and method for efficient object metadata coding
WO2016202682A1 (en) * 2015-06-17 2016-12-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Loudness control for user interactivity in audio coding systems
EP2997573A4 (en) * 2013-05-17 2017-01-18 Nokia Technologies OY Spatial object oriented audio apparatus
EP3007168A4 (en) * 2013-05-31 2017-01-25 Sony Corporation Encoding device and method, decoding device and method, and program
JP2017503375A (en) * 2013-11-14 2017-01-26 ドルビー ラボラトリーズ ライセンシング コーポレイション Audio versus screen rendering and audio encoding and decoding for such rendering
WO2017023423A1 (en) * 2015-07-31 2017-02-09 Apple Inc. Encoded audio metadata-based equalization
US9578435B2 (en) 2013-07-22 2017-02-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for enhanced spatial audio object coding
US9609452B2 (en) 2013-02-08 2017-03-28 Qualcomm Incorporated Obtaining sparseness information for higher order ambisonic audio renderers
EP3059732A4 (en) * 2013-10-17 2017-04-19 Socionext Inc. Audio encoding device and audio decoding device
JPWO2015182491A1 (en) * 2014-05-30 2017-04-20 ソニー株式会社 Information processing apparatus and information processing method
JPWO2015186535A1 (en) * 2014-06-06 2017-04-20 ソニー株式会社 Audio signal processing apparatus and method, encoding apparatus and method, and program
JPWO2016002738A1 (en) * 2014-06-30 2017-05-25 ソニー株式会社 Information processing apparatus and information processing method
EP3039674A4 (en) * 2013-08-28 2017-06-07 Landr Audio Inc. System and method for performing automatic audio production using semantic data
US9712939B2 (en) 2013-07-30 2017-07-18 Dolby Laboratories Licensing Corporation Panning of audio objects to arbitrary speaker layouts
JPWO2016052191A1 (en) * 2014-09-30 2017-07-20 ソニー株式会社 Transmitting apparatus, transmitting method, receiving apparatus, and receiving method
GB2550877A (en) * 2016-05-26 2017-12-06 Univ Surrey Object-based audio rendering
US9883310B2 (en) 2013-02-08 2018-01-30 Qualcomm Incorporated Obtaining symmetry information for higher order ambisonic audio renderers
EP3197182A4 (en) * 2014-08-13 2018-04-18 Samsung Electronics Co., Ltd. Method and device for generating and playing back audio signal
RU2655994C2 (en) * 2013-04-26 2018-05-30 Сони Корпорейшн Audio processing device and audio processing system
US10034117B2 (en) 2013-11-28 2018-07-24 Dolby Laboratories Licensing Corporation Position-based gain adjustment of object-based audio and ring-based channel audio
EP3358862A1 (en) * 2017-02-06 2018-08-08 Visteon Global Technologies, Inc. Method and device for stereophonic depiction of virtual noise sources in a vehicle
US10057707B2 (en) 2015-02-03 2018-08-21 Dolby Laboratories Licensing Corporation Optimized virtual scene layout for spatial meeting playback
US10063985B2 (en) 2015-05-14 2018-08-28 Dolby Laboratories Licensing Corporation Generation and playback of near-field audio content
US10068577B2 (en) 2014-04-25 2018-09-04 Dolby Laboratories Licensing Corporation Audio segmentation based on spatial metadata
EP3451706A1 (en) * 2014-03-24 2019-03-06 Dolby International AB Method and device for applying dynamic range compression to a higher order ambisonics signal
US10251007B2 (en) 2015-11-20 2019-04-02 Dolby Laboratories Licensing Corporation System and method for rendering an audio program
US10257636B2 (en) 2015-04-21 2019-04-09 Dolby Laboratories Licensing Corporation Spatial audio signal manipulation
US10341770B2 (en) 2015-09-30 2019-07-02 Apple Inc. Encoded audio metadata-based loudness equalization and dynamic equalization during DRC
US10354359B2 (en) 2013-08-21 2019-07-16 Interdigital Ce Patent Holdings Video display with pan function controlled by viewing direction
WO2019158750A1 (en) * 2018-02-19 2019-08-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for object-based spatial audio-mastering
US10425764B2 (en) 2015-08-14 2019-09-24 Dts, Inc. Bass management for object-based audio
JP2019207435A (en) * 2014-10-03 2019-12-05 ドルビー・インターナショナル・アーベー Smart access to personalized audio
US10567185B2 (en) 2015-02-03 2020-02-18 Dolby Laboratories Licensing Corporation Post-conference playback system having higher perceived quality than originally heard in the conference
CN110827810A (en) * 2013-07-04 2020-02-21 三星电子株式会社 Apparatus and method for recognizing speech and text
CN111164679A (en) * 2017-10-05 2020-05-15 索尼公司 Encoding device and method, decoding device and method, and program
US10674299B2 (en) 2014-04-11 2020-06-02 Samsung Electronics Co., Ltd. Method and apparatus for rendering sound signal, and computer-readable recording medium
US10764709B2 (en) 2017-01-13 2020-09-01 Dolby Laboratories Licensing Corporation Methods, apparatus and systems for dynamic equalization for cross-talk cancellation
EP3719789A1 (en) * 2019-04-03 2020-10-07 Yamaha Corporation Sound signal processor and sound signal processing method
WO2021003397A1 (en) * 2019-07-03 2021-01-07 Qualcomm Incorporated Password-based authorization for audio rendering
WO2021003351A1 (en) * 2019-07-03 2021-01-07 Qualcomm Incorporated Adapting audio streams for rendering
US10956121B2 (en) 2013-09-12 2021-03-23 Dolby Laboratories Licensing Corporation Dynamic range control for a wide variety of playback environments
US11170796B2 (en) 2015-06-19 2021-11-09 Sony Corporation Multiple metadata part-based encoding apparatus, encoding method, decoding apparatus, decoding method, and program
US11363398B2 (en) * 2014-12-11 2022-06-14 Dolby Laboratories Licensing Corporation Metadata-preserved audio object clustering
EP4002870A4 (en) * 2019-07-19 2022-09-28 Sony Group Corporation Signal processing device and method, and program
JP7182751B1 (en) 2019-12-02 2022-12-02 ドルビー ラボラトリーズ ライセンシング コーポレイション System, method, and apparatus for conversion of channel-based audio to object-based audio

Families Citing this family (210)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102115723B1 (en) 2011-07-01 2020-05-28 돌비 레버러토리즈 라이쎈싱 코오포레이션 System and method for adaptive audio signal generation, coding and rendering
US9589571B2 (en) 2012-07-19 2017-03-07 Dolby Laboratories Licensing Corporation Method and device for improving the rendering of multi-channel audio signals
EP2863657B1 (en) * 2012-07-31 2019-09-18 Intellectual Discovery Co., Ltd. Method and device for processing audio signal
RU2602346C2 (en) 2012-08-31 2016-11-20 Долби Лэборетериз Лайсенсинг Корпорейшн Rendering of reflected sound for object-oriented audio information
WO2014035902A2 (en) 2012-08-31 2014-03-06 Dolby Laboratories Licensing Corporation Reflected and direct rendering of upmixed content to individually addressable drivers
EP2891339B1 (en) 2012-08-31 2017-08-16 Dolby Laboratories Licensing Corporation Bi-directional interconnect for communication between a renderer and an array of individually addressable drivers
EP2891336B1 (en) 2012-08-31 2017-10-04 Dolby Laboratories Licensing Corporation Virtual rendering of object-based audio
BR122021021506B1 (en) * 2012-09-12 2023-01-31 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V APPARATUS AND METHOD FOR PROVIDING ENHANCED GUIDED DOWNMIX CAPABILITIES FOR 3D AUDIO
KR20140047509A (en) * 2012-10-12 2014-04-22 한국전자통신연구원 Audio coding/decoding apparatus using reverberation signal of object audio signal
US9805725B2 (en) 2012-12-21 2017-10-31 Dolby Laboratories Licensing Corporation Object clustering for rendering object-based audio content based on perceptual criteria
TWI635753B (en) 2013-01-07 2018-09-11 美商杜比實驗室特許公司 Virtual height filter for reflected sound rendering using upward firing drivers
EP2757558A1 (en) * 2013-01-18 2014-07-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Time domain level adjustment for audio signal decoding or encoding
US10038957B2 (en) * 2013-03-19 2018-07-31 Nokia Technologies Oy Audio mixing based upon playing device location
CN105144751A (en) * 2013-04-15 2015-12-09 英迪股份有限公司 Audio signal processing method using generating virtual object
US9705953B2 (en) * 2013-06-17 2017-07-11 Adobe Systems Incorporated Local control of digital signal processing
CN105493182B (en) * 2013-08-28 2020-01-21 杜比实验室特许公司 Hybrid waveform coding and parametric coding speech enhancement
US9067135B2 (en) 2013-10-07 2015-06-30 Voyetra Turtle Beach, Inc. Method and system for dynamic control of game audio based on audio analysis
US9338541B2 (en) 2013-10-09 2016-05-10 Voyetra Turtle Beach, Inc. Method and system for in-game visualization based on audio analysis
US9716958B2 (en) * 2013-10-09 2017-07-25 Voyetra Turtle Beach, Inc. Method and system for surround sound processing in a headset
US10063982B2 (en) 2013-10-09 2018-08-28 Voyetra Turtle Beach, Inc. Method and system for a game headset with audio alerts based on audio track analysis
US8979658B1 (en) 2013-10-10 2015-03-17 Voyetra Turtle Beach, Inc. Dynamic adjustment of game controller sensitivity based on audio analysis
KR102231755B1 (en) * 2013-10-25 2021-03-24 삼성전자주식회사 Method and apparatus for 3D sound reproducing
US9888333B2 (en) * 2013-11-11 2018-02-06 Google Technology Holdings LLC Three-dimensional audio rendering techniques
US9704491B2 (en) 2014-02-11 2017-07-11 Disney Enterprises, Inc. Storytelling environment: distributed immersive audio soundscape
KR102370031B1 (en) * 2014-03-18 2022-03-04 코닌클리케 필립스 엔.브이. Audiovisual content item data streams
KR102429841B1 (en) * 2014-03-21 2022-08-05 돌비 인터네셔널 에이비 Method for compressing a higher order ambisonics(hoa) signal, method for decompressing a compressed hoa signal, apparatus for compressing a hoa signal, and apparatus for decompressing a compressed hoa signal
EP2922057A1 (en) 2014-03-21 2015-09-23 Thomson Licensing Method for compressing a Higher Order Ambisonics (HOA) signal, method for decompressing a compressed HOA signal, apparatus for compressing a HOA signal, and apparatus for decompressing a compressed HOA signal
US10412522B2 (en) 2014-03-21 2019-09-10 Qualcomm Incorporated Inserting audio channels into descriptions of soundfields
EP2925024A1 (en) 2014-03-26 2015-09-30 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for audio rendering employing a geometric distance definition
HK1195445A2 (en) * 2014-05-08 2014-11-07 黃偉明 Endpoint mixing system and reproduction method of endpoint mixed sounds
CN109068260B (en) * 2014-05-21 2020-11-27 杜比国际公司 System and method for configuring playback of audio via a home audio playback system
EP3149971B1 (en) * 2014-05-30 2018-08-29 Qualcomm Incorporated Obtaining sparseness information for higher order ambisonic audio renderers
US10139907B2 (en) 2014-06-16 2018-11-27 Immersion Corporation Systems and methods for foley-style haptic content creation
JP6607183B2 (en) * 2014-07-18 2019-11-20 ソニー株式会社 Transmitting apparatus, transmitting method, receiving apparatus, and receiving method
WO2016018787A1 (en) * 2014-07-31 2016-02-04 Dolby Laboratories Licensing Corporation Audio processing systems and methods
CN105657633A (en) * 2014-09-04 2016-06-08 杜比实验室特许公司 Method for generating metadata aiming at audio object
US9782672B2 (en) * 2014-09-12 2017-10-10 Voyetra Turtle Beach, Inc. Gaming headset with enhanced off-screen awareness
US9774974B2 (en) 2014-09-24 2017-09-26 Electronics And Telecommunications Research Institute Audio metadata providing apparatus and method, and multichannel audio data playback apparatus and method to support dynamic format conversion
EP3198887A1 (en) * 2014-09-24 2017-08-02 Dolby Laboratories Licensing Corp. Overhead speaker system
US20160094914A1 (en) * 2014-09-30 2016-03-31 Alcatel-Lucent Usa Inc. Systems and methods for localizing audio streams via acoustic large scale speaker arrays
KR102482162B1 (en) * 2014-10-01 2022-12-29 돌비 인터네셔널 에이비 Audio encoder and decoder
KR102226817B1 (en) * 2014-10-01 2021-03-11 삼성전자주식회사 Method for reproducing contents and an electronic device thereof
MY179448A (en) * 2014-10-02 2020-11-06 Dolby Int Ab Decoding method and decoder for dialog enhancement
EP3213323B1 (en) 2014-10-31 2018-12-12 Dolby International AB Parametric encoding and decoding of multichannel audio signals
US9560467B2 (en) * 2014-11-11 2017-01-31 Google Inc. 3D immersive spatial audio systems and methods
US10609475B2 (en) 2014-12-05 2020-03-31 Stages Llc Active noise control and customized audio system
US10057705B2 (en) * 2015-01-13 2018-08-21 Harman International Industries, Incorporated System and method for transitioning between audio system modes
JP6550756B2 (en) * 2015-01-20 2019-07-31 ヤマハ株式会社 Audio signal processor
EP3254477A1 (en) 2015-02-03 2017-12-13 Dolby Laboratories Licensing Corporation Adaptive audio construction
CN105992120B (en) * 2015-02-09 2019-12-31 杜比实验室特许公司 Upmixing of audio signals
CN105989845B (en) 2015-02-25 2020-12-08 杜比实验室特许公司 Video content assisted audio object extraction
US9933991B2 (en) * 2015-03-10 2018-04-03 Harman International Industries, Limited Remote controlled digital audio mixing system
TWI693594B (en) * 2015-03-13 2020-05-11 瑞典商杜比國際公司 Decoding audio bitstreams with enhanced spectral band replication metadata in at least one fill element
WO2016148552A2 (en) * 2015-03-19 2016-09-22 (주)소닉티어랩 Device and method for reproducing three-dimensional sound image in sound image externalization
US10992727B2 (en) * 2015-04-08 2021-04-27 Sony Corporation Transmission apparatus, transmission method, reception apparatus, and reception method
WO2016172111A1 (en) * 2015-04-20 2016-10-27 Dolby Laboratories Licensing Corporation Processing audio data to compensate for partial hearing loss or an adverse hearing environment
US20160315722A1 (en) * 2015-04-22 2016-10-27 Apple Inc. Audio stem delivery and control
US10304467B2 (en) 2015-04-24 2019-05-28 Sony Corporation Transmission device, transmission method, reception device, and reception method
KR102357293B1 (en) * 2015-05-26 2022-01-28 삼성전자주식회사 Stereophonic sound reproduction method and apparatus
US9985676B2 (en) * 2015-06-05 2018-05-29 Braven, Lc Multi-channel mixing console
US9530426B1 (en) * 2015-06-24 2016-12-27 Microsoft Technology Licensing, Llc Filtering sounds for conferencing applications
DE102015008000A1 (en) * 2015-06-24 2016-12-29 Saalakustik.De Gmbh Method for reproducing sound in reflection environments, in particular in listening rooms
US10334387B2 (en) 2015-06-25 2019-06-25 Dolby Laboratories Licensing Corporation Audio panning transformation system and method
GB2540226A (en) * 2015-07-08 2017-01-11 Nokia Technologies Oy Distributed audio microphone array and locator configuration
CN105187625B (en) * 2015-07-13 2018-11-16 努比亚技术有限公司 A kind of electronic equipment and audio-frequency processing method
GB2540404B (en) * 2015-07-16 2019-04-10 Powerchord Group Ltd Synchronising an audio signal
GB2540407B (en) * 2015-07-16 2020-05-20 Powerchord Group Ltd Personal audio mixer
GB2529310B (en) * 2015-07-16 2016-11-30 Powerchord Group Ltd A method of augmenting an audio content
CN105070304B (en) 2015-08-11 2018-09-04 小米科技有限责任公司 Realize method and device, the electronic equipment of multi-object audio recording
KR102423753B1 (en) 2015-08-20 2022-07-21 삼성전자주식회사 Method and apparatus for processing audio signal based on speaker location information
US9832590B2 (en) * 2015-09-12 2017-11-28 Dolby Laboratories Licensing Corporation Audio program playback calibration based on content creation environment
WO2017058097A1 (en) * 2015-09-28 2017-04-06 Razer (Asia-Pacific) Pte. Ltd. Computers, methods for controlling a computer, and computer-readable media
US20170098452A1 (en) * 2015-10-02 2017-04-06 Dts, Inc. Method and system for audio processing of dialog, music, effect and height objects
US9877137B2 (en) * 2015-10-06 2018-01-23 Disney Enterprises, Inc. Systems and methods for playing a venue-specific object-based audio
CN108141674A (en) 2015-10-21 2018-06-08 富士胶片株式会社 Audio-video system
US9807535B2 (en) * 2015-10-30 2017-10-31 International Business Machines Corporation Three dimensional audio speaker array
CN105979349A (en) * 2015-12-03 2016-09-28 乐视致新电子科技(天津)有限公司 Audio frequency data processing method and device
CN108370482B (en) 2015-12-18 2020-07-28 杜比实验室特许公司 Dual directional speaker for presenting immersive audio content
WO2017126895A1 (en) * 2016-01-19 2017-07-27 지오디오랩 인코포레이티드 Device and method for processing audio signal
WO2017130210A1 (en) * 2016-01-27 2017-08-03 Indian Institute Of Technology Bombay Method and system for rendering audio streams
US11290819B2 (en) 2016-01-29 2022-03-29 Dolby Laboratories Licensing Corporation Distributed amplification and control system for immersive audio multi-channel amplifier
CN105656915B (en) * 2016-01-29 2019-01-18 腾讯科技(深圳)有限公司 Immediate communication methods, devices and systems
EP3408936B1 (en) 2016-01-29 2019-12-04 Dolby Laboratories Licensing Corporation Multi-channel amplifier with continuous class-d modulator and embedded pld and resonant frequency detector
US10778160B2 (en) 2016-01-29 2020-09-15 Dolby Laboratories Licensing Corporation Class-D dynamic closed loop feedback amplifier
CN112218229B (en) 2016-01-29 2022-04-01 杜比实验室特许公司 System, method and computer readable medium for audio signal processing
US9924291B2 (en) * 2016-02-16 2018-03-20 Sony Corporation Distributed wireless speaker system
US10573324B2 (en) * 2016-02-24 2020-02-25 Dolby International Ab Method and system for bit reservoir control in case of varying metadata
CN105898669B (en) * 2016-03-18 2017-10-20 南京青衿信息科技有限公司 A kind of coding method of target voice
US11528554B2 (en) 2016-03-24 2022-12-13 Dolby Laboratories Licensing Corporation Near-field rendering of immersive audio content in portable computers and devices
US10325610B2 (en) * 2016-03-30 2019-06-18 Microsoft Technology Licensing, Llc Adaptive audio rendering
EP3472832A4 (en) 2016-06-17 2020-03-11 DTS, Inc. Distance panning using near / far-field rendering
US20170372697A1 (en) * 2016-06-22 2017-12-28 Elwha Llc Systems and methods for rule-based user control of audio rendering
US10951985B1 (en) * 2016-07-01 2021-03-16 Gebre Waddell Method and system for audio critical listening and evaluation
US9956910B2 (en) * 2016-07-18 2018-05-01 Toyota Motor Engineering & Manufacturing North America, Inc. Audible notification systems and methods for autonomous vehicles
JP7404067B2 (en) * 2016-07-22 2023-12-25 ドルビー ラボラトリーズ ライセンシング コーポレイション Network-based processing and delivery of multimedia content for live music performances
CN106375778B (en) * 2016-08-12 2020-04-17 南京青衿信息科技有限公司 Method for transmitting three-dimensional audio program code stream conforming to digital movie specification
GB201615538D0 (en) * 2016-09-13 2016-10-26 Nokia Technologies Oy A method , apparatus and computer program for processing audio signals
WO2018055860A1 (en) * 2016-09-20 2018-03-29 ソニー株式会社 Information processing device, information processing method and program
JP6693569B2 (en) * 2016-09-28 2020-05-13 ヤマハ株式会社 Mixer, control method of mixer, and program
GB2554447A (en) 2016-09-28 2018-04-04 Nokia Technologies Oy Gain control in spatial audio systems
CN109791193B (en) * 2016-09-29 2023-11-10 杜比实验室特许公司 Automatic discovery and localization of speaker locations in a surround sound system
US10349196B2 (en) * 2016-10-03 2019-07-09 Nokia Technologies Oy Method of editing audio signals using separated objects and associated apparatus
US10419866B2 (en) 2016-10-07 2019-09-17 Microsoft Technology Licensing, Llc Shared three-dimensional audio bed
US9980078B2 (en) * 2016-10-14 2018-05-22 Nokia Technologies Oy Audio object modification in free-viewpoint rendering
US10516914B2 (en) * 2016-10-19 2019-12-24 Centurylink Intellectual Property Llc Method and system for implementing automatic audio optimization for streaming services
EP3470976A1 (en) 2017-10-12 2019-04-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for efficient delivery and usage of audio messages for high quality of experience
US10535355B2 (en) 2016-11-18 2020-01-14 Microsoft Technology Licensing, Llc Frame coding for spatial audio data
US10945080B2 (en) 2016-11-18 2021-03-09 Stages Llc Audio analysis and processing system
JP7014176B2 (en) 2016-11-25 2022-02-01 ソニーグループ株式会社 Playback device, playback method, and program
JP6993774B2 (en) * 2016-12-07 2022-01-14 シャープ株式会社 Audio output controller
US11012803B2 (en) * 2017-01-27 2021-05-18 Auro Technologies Nv Processing method and system for panning audio objects
WO2018150774A1 (en) * 2017-02-17 2018-08-23 シャープ株式会社 Voice signal processing device and voice signal processing system
WO2018173413A1 (en) * 2017-03-24 2018-09-27 シャープ株式会社 Audio signal processing device and audio signal processing system
EP3624116B1 (en) * 2017-04-13 2022-05-04 Sony Group Corporation Signal processing device, method, and program
US9820073B1 (en) 2017-05-10 2017-11-14 Tls Corp. Extracting a common signal from multiple audio signals
US11595774B2 (en) 2017-05-12 2023-02-28 Microsoft Technology Licensing, Llc Spatializing audio data based on analysis of incoming audio data
US9843883B1 (en) * 2017-05-12 2017-12-12 QoSound, Inc. Source independent sound field rotation for virtual and augmented reality applications
US20180357038A1 (en) * 2017-06-09 2018-12-13 Qualcomm Incorporated Audio metadata modification at rendering device
WO2018231185A1 (en) * 2017-06-16 2018-12-20 Василий Васильевич ДУМА Method of synchronizing sound signals
US10028069B1 (en) 2017-06-22 2018-07-17 Sonos, Inc. Immersive audio in a media playback system
US10516962B2 (en) 2017-07-06 2019-12-24 Huddly As Multi-channel binaural recording and dynamic playback
US11386913B2 (en) 2017-08-01 2022-07-12 Dolby Laboratories Licensing Corporation Audio object classification based on location metadata
US11272308B2 (en) * 2017-09-29 2022-03-08 Apple Inc. File format for spatial audio
US11128977B2 (en) 2017-09-29 2021-09-21 Apple Inc. Spatial audio downmixing
FR3072840B1 (en) * 2017-10-23 2021-06-04 L Acoustics SPACE ARRANGEMENT OF SOUND DISTRIBUTION DEVICES
US11102022B2 (en) 2017-11-10 2021-08-24 Hewlett-Packard Development Company, L.P. Conferencing environment monitoring
US10440497B2 (en) * 2017-11-17 2019-10-08 Intel Corporation Multi-modal dereverbaration in far-field audio systems
US10511909B2 (en) * 2017-11-29 2019-12-17 Boomcloud 360, Inc. Crosstalk cancellation for opposite-facing transaural loudspeaker systems
EP3726859A4 (en) 2017-12-12 2021-04-14 Sony Corporation Signal processing device and method, and program
TWI702594B (en) 2018-01-26 2020-08-21 瑞典商都比國際公司 Backward-compatible integration of high frequency reconstruction techniques for audio signals
ES2922532T3 (en) * 2018-02-01 2022-09-16 Fraunhofer Ges Forschung Audio scene encoder, audio scene decoder, and related procedures using hybrid encoder/decoder spatial analysis
KR102482960B1 (en) 2018-02-07 2022-12-29 삼성전자주식회사 Method for playing audio data using dual speaker and electronic device thereof
US10514882B2 (en) 2018-02-21 2019-12-24 Microsoft Technology Licensing, Llc Digital audio processing system for adjoining digital audio stems based on computed audio intensity/characteristics
WO2019199359A1 (en) 2018-04-08 2019-10-17 Dts, Inc. Ambisonic depth extraction
CN115334444A (en) * 2018-04-11 2022-11-11 杜比国际公司 Method, apparatus and system for pre-rendering signals for audio rendering
BR112020016912A2 (en) * 2018-04-16 2020-12-15 Dolby Laboratories Licensing Corporation METHODS, DEVICES AND SYSTEMS FOR ENCODING AND DECODING DIRECTIONAL SOURCES
US10672405B2 (en) * 2018-05-07 2020-06-02 Google Llc Objective quality metrics for ambisonic spatial audio
US10630870B2 (en) * 2018-06-20 2020-04-21 Gdc Technology (Shenzhen) Limited System and method for augmented reality movie screenings
EP3588988B1 (en) * 2018-06-26 2021-02-17 Nokia Technologies Oy Selective presentation of ambient audio content for spatial audio presentation
MX2020009578A (en) 2018-07-02 2020-10-05 Dolby Laboratories Licensing Corp Methods and devices for generating or decoding a bitstream comprising immersive audio signals.
US20200007988A1 (en) * 2018-07-02 2020-01-02 Microchip Technology Incorporated Wireless signal source based audio output and related systems, methods and devices
US10445056B1 (en) * 2018-07-03 2019-10-15 Disney Enterprises, Inc. System for deliverables versioning in audio mastering
CN110675889A (en) 2018-07-03 2020-01-10 阿里巴巴集团控股有限公司 Audio signal processing method, client and electronic equipment
US10455078B1 (en) * 2018-07-11 2019-10-22 International Business Machines Corporation Enhancing privacy in mobile phone calls by caller controlled audio delivering modes
GB2575510A (en) * 2018-07-13 2020-01-15 Nokia Technologies Oy Spatial augmentation
US11159327B2 (en) * 2018-08-06 2021-10-26 Tyson York Winarski Blockchain augmentation of a material exchange format MXF file
JP7019096B2 (en) 2018-08-30 2022-02-14 ドルビー・インターナショナル・アーベー Methods and equipment to control the enhancement of low bit rate coded audio
US10404467B1 (en) * 2018-09-09 2019-09-03 Tyson York Winarski Blockchain digest augmention of media files including group-of-pictures video streams for MXF files
US20200081681A1 (en) * 2018-09-10 2020-03-12 Spotify Ab Mulitple master music playback
JP7363795B2 (en) * 2018-09-28 2023-10-18 ソニーグループ株式会社 Information processing device, method, and program
US10932344B2 (en) * 2018-10-09 2021-02-23 Rovi Guides, Inc. Systems and methods for emulating an environment created by the outputs of a plurality of devices
CN111869239B (en) 2018-10-16 2021-10-08 杜比实验室特许公司 Method and apparatus for bass management
EP3870991A4 (en) 2018-10-24 2022-08-17 Otto Engineering Inc. Directional awareness audio communications system
EP4344194A2 (en) * 2018-11-13 2024-03-27 Dolby Laboratories Licensing Corporation Audio processing in immersive audio services
CN109451417B (en) * 2018-11-29 2024-03-15 广州艾美网络科技有限公司 Multichannel audio processing method and system
EP3900373A4 (en) * 2018-12-18 2022-08-10 Intel Corporation Display-based audio splitting in media environments
US11503422B2 (en) * 2019-01-22 2022-11-15 Harman International Industries, Incorporated Mapping virtual sound sources to physical speakers in extended reality applications
KR20200107757A (en) * 2019-03-08 2020-09-16 엘지전자 주식회사 Method and apparatus for sound object following
US11206504B2 (en) * 2019-04-02 2021-12-21 Syng, Inc. Systems and methods for spatial audio rendering
US11087738B2 (en) * 2019-06-11 2021-08-10 Lucasfilm Entertainment Company Ltd. LLC System and method for music and effects sound mix creation in audio soundtrack versioning
CN112233647A (en) * 2019-06-26 2021-01-15 索尼公司 Information processing apparatus and method, and computer-readable storage medium
CN112153530B (en) * 2019-06-28 2022-05-27 苹果公司 Spatial audio file format for storing capture metadata
US11841899B2 (en) 2019-06-28 2023-12-12 Apple Inc. Spatial audio file format for storing capture metadata
JP2022539217A (en) 2019-07-02 2022-09-07 ドルビー・インターナショナル・アーベー Method, Apparatus, and System for Representing, Encoding, and Decoding Discrete Directional Information
WO2021007246A1 (en) 2019-07-09 2021-01-14 Dolby Laboratories Licensing Corporation Presentation independent mastering of audio content
JP2021048500A (en) * 2019-09-19 2021-03-25 ソニー株式会社 Signal processing apparatus, signal processing method, and signal processing system
TWI735968B (en) * 2019-10-09 2021-08-11 名世電子企業股份有限公司 Sound field type natural environment sound system
TW202123220A (en) 2019-10-30 2021-06-16 美商杜拜研究特許公司 Multichannel audio encode and decode using directional metadata
US11096006B1 (en) * 2019-11-04 2021-08-17 Facebook Technologies, Llc Dynamic speech directivity reproduction
CN110782865B (en) * 2019-11-06 2023-08-18 上海音乐学院 Three-dimensional sound creation interactive system
US11533560B2 (en) 2019-11-15 2022-12-20 Boomcloud 360 Inc. Dynamic rendering device metadata-informed audio enhancement system
WO2021098957A1 (en) * 2019-11-20 2021-05-27 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio object renderer, methods for determining loudspeaker gains and computer program using panned object loudspeaker gains and spread object loudspeaker gains
WO2021099363A2 (en) * 2019-11-20 2021-05-27 Dolby International Ab Methods and devices for personalizing audio content
RU2721180C1 (en) * 2019-12-02 2020-05-18 Самсунг Электроникс Ко., Лтд. Method for generating an animation model of a head based on a speech signal and an electronic computing device which implements it
KR20210072388A (en) * 2019-12-09 2021-06-17 삼성전자주식회사 Audio outputting apparatus and method of controlling the audio outputting appratus
EP4073792A1 (en) * 2019-12-09 2022-10-19 Dolby Laboratories Licensing Corp. Adjusting audio and non-audio features based on noise metrics and speech intelligibility metrics
JP7443870B2 (en) 2020-03-24 2024-03-06 ヤマハ株式会社 Sound signal output method and sound signal output device
US11900412B2 (en) * 2020-03-25 2024-02-13 Applied Minds, Llc Audience participation application, system, and method of use
CN111586553B (en) * 2020-05-27 2022-06-03 京东方科技集团股份有限公司 Display device and working method thereof
US11275629B2 (en) * 2020-06-25 2022-03-15 Microsoft Technology Licensing, Llc Mixed reality complementary systems
WO2022010454A1 (en) * 2020-07-06 2022-01-13 Hewlett-Packard Development Company, L.P. Binaural down-mixing of audio signals
CA3187342A1 (en) * 2020-07-30 2022-02-03 Guillaume Fuchs Apparatus, method and computer program for encoding an audio signal or for decoding an encoded audio scene
CN112398455B (en) * 2020-10-21 2022-09-27 头领科技(昆山)有限公司 Adaptive power amplifier chip and adaptive control method thereof
CN112312298A (en) 2020-11-19 2021-02-02 北京小米松果电子有限公司 Audio playing method and device, electronic equipment and storage medium
US11930348B2 (en) * 2020-11-24 2024-03-12 Naver Corporation Computer system for realizing customized being-there in association with audio and method thereof
KR102500694B1 (en) 2020-11-24 2023-02-16 네이버 주식회사 Computer system for producing audio content for realzing customized being-there and method thereof
US11930349B2 (en) 2020-11-24 2024-03-12 Naver Corporation Computer system for producing audio content for realizing customized being-there and method thereof
US11521623B2 (en) 2021-01-11 2022-12-06 Bank Of America Corporation System and method for single-speaker identification in a multi-speaker environment on a low-frequency audio recording
CN114915874B (en) * 2021-02-10 2023-07-25 北京全景声信息科技有限公司 Audio processing method, device, equipment and medium
RU2759666C1 (en) * 2021-02-19 2021-11-16 Общество с ограниченной ответственностью «ЯЛОС СТРИМ» Audio-video data playback system
KR20220146165A (en) * 2021-04-23 2022-11-01 삼성전자주식회사 An electronic apparatus and a method for processing audio signal
GB2618016A (en) * 2021-04-30 2023-10-25 That Corp Passive sub-audible room path learning with noise modeling
EP4310839A1 (en) * 2021-05-21 2024-01-24 Samsung Electronics Co., Ltd. Apparatus and method for processing multi-channel audio signal
KR20240014462A (en) * 2021-05-28 2024-02-01 돌비 레버러토리즈 라이쎈싱 코오포레이션 Adjusting the dynamic range of spatial audio objects
CN113938811A (en) * 2021-09-01 2022-01-14 赛因芯微(北京)电子科技有限公司 Audio channel metadata based on sound bed, generation method, equipment and storage medium
CN113923584A (en) * 2021-09-01 2022-01-11 赛因芯微(北京)电子科技有限公司 Matrix-based audio channel metadata and generation method, equipment and storage medium
CN113905321A (en) * 2021-09-01 2022-01-07 赛因芯微(北京)电子科技有限公司 Object-based audio channel metadata and generation method, device and storage medium
CN113905322A (en) * 2021-09-01 2022-01-07 赛因芯微(北京)电子科技有限公司 Method, device and storage medium for generating metadata based on binaural audio channel
CN113963724A (en) * 2021-09-18 2022-01-21 赛因芯微(北京)电子科技有限公司 Audio content metadata and generation method, electronic device and storage medium
CN114143695A (en) * 2021-10-15 2022-03-04 赛因芯微(北京)电子科技有限公司 Audio stream metadata and generation method, electronic equipment and storage medium
CN114363790A (en) * 2021-11-26 2022-04-15 赛因芯微(北京)电子科技有限公司 Method, apparatus, device and medium for generating metadata of serial audio block format
CN114363792A (en) * 2021-11-26 2022-04-15 赛因芯微(北京)电子科技有限公司 Transmission audio track format serial metadata generation method, device, equipment and medium
CN114363791A (en) * 2021-11-26 2022-04-15 赛因芯微(北京)电子科技有限公司 Serial audio metadata generation method, device, equipment and storage medium
US11902771B2 (en) * 2021-12-27 2024-02-13 Spatialx Inc. Audio space simulation in a localized audio environment
CN114510212B (en) * 2021-12-31 2023-08-08 赛因芯微(北京)电子科技有限公司 Data transmission method, device and equipment based on serial digital audio interface
CN114509043A (en) * 2022-02-15 2022-05-17 深圳须弥云图空间科技有限公司 Spatial object coding method, device, equipment and medium
CN117581566A (en) * 2022-05-05 2024-02-20 北京小米移动软件有限公司 Audio processing method, device and storage medium
KR102504081B1 (en) * 2022-08-18 2023-02-28 주식회사 킨트 System for mastering sound files
KR102608935B1 (en) * 2023-04-06 2023-12-04 뉴튠(주) Method and apparatus for providing real-time audio mixing service based on user information
CN116594586B (en) * 2023-07-18 2023-09-26 苏州清听声学科技有限公司 Vehicle-mounted self-adaptive adjusting audio playing system and method

Family Cites Families (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155510A (en) 1990-11-29 1992-10-13 Digital Theater Systems Corporation Digital sound system for motion pictures with analog sound track emulation
RU1332U1 (en) 1993-11-25 1995-12-16 Магаданское государственное геологическое предприятие "Новая техника" Hydraulic monitor
US5717765A (en) 1994-03-07 1998-02-10 Sony Corporation Theater sound system with upper surround channels
JPH0951600A (en) * 1995-08-03 1997-02-18 Fujitsu Ten Ltd Sound effect reproducing system
US5642423A (en) 1995-11-22 1997-06-24 Sony Corporation Digital surround sound processor
US5970152A (en) * 1996-04-30 1999-10-19 Srs Labs, Inc. Audio enhancement system for use in a surround sound environment
US6229899B1 (en) 1996-07-17 2001-05-08 American Technology Corporation Method and device for developing a virtual speaker distant from the sound source
US6164018A (en) 1997-12-08 2000-12-26 Shopro, Inc. Cinematic theater and theater multiplex
US6624873B1 (en) 1998-05-05 2003-09-23 Dolby Laboratories Licensing Corporation Matrix-encoded surround-sound channels in a discrete digital sound format
US6931370B1 (en) * 1999-11-02 2005-08-16 Digital Theater Systems, Inc. System and method for providing interactive audio in a multi-channel audio environment
US6771323B1 (en) 1999-11-15 2004-08-03 Thx Ltd. Audio visual display adjustment using captured content characteristics
EP1134724B1 (en) * 2000-03-17 2008-07-23 Sony France S.A. Real time audio spatialisation system with high level control
WO2001082651A1 (en) * 2000-04-19 2001-11-01 Sonic Solutions Multi-channel surround sound mastering and reproduction techniques that preserve spatial harmonics in three dimensions
US7212872B1 (en) 2000-05-10 2007-05-01 Dts, Inc. Discrete multichannel audio with a backward compatible mix
US6970822B2 (en) 2001-03-07 2005-11-29 Microsoft Corporation Accessing audio processing components in an audio generation system
KR20030015806A (en) 2001-08-17 2003-02-25 최해용 Optical system for theaterical visual & sound
CN100508026C (en) * 2002-04-10 2009-07-01 皇家飞利浦电子股份有限公司 Coding of stereo signals
JP2003348700A (en) * 2002-05-28 2003-12-05 Victor Co Of Japan Ltd Presence signal generating method, and presence signal generating apparatus
US20030223603A1 (en) 2002-05-28 2003-12-04 Beckman Kenneth Oren Sound space replication
DE10254404B4 (en) * 2002-11-21 2004-11-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio reproduction system and method for reproducing an audio signal
GB0301093D0 (en) * 2003-01-17 2003-02-19 1 Ltd Set-up method for array-type sound systems
GB0304126D0 (en) * 2003-02-24 2003-03-26 1 Ltd Sound beam loudspeaker system
FR2853802B1 (en) 2003-04-11 2005-06-24 Pierre Denis Rene Vincent INSTALLATION FOR THE PROJECTION OF CINEMATOGRAPHIC OR DIGITAL AUDIO WORKS
US20070136050A1 (en) 2003-07-07 2007-06-14 Koninklijke Philips Electronics N.V. System and method for audio signal processing
US6972828B2 (en) 2003-12-18 2005-12-06 Eastman Kodak Company Method and system for preserving the creative intent within a motion picture production chain
SE0400997D0 (en) 2004-04-16 2004-04-16 Cooding Technologies Sweden Ab Efficient coding or multi-channel audio
SE0400998D0 (en) * 2004-04-16 2004-04-16 Cooding Technologies Sweden Ab Method for representing multi-channel audio signals
US7106411B2 (en) 2004-05-05 2006-09-12 Imax Corporation Conversion of cinema theatre to a super cinema theatre
WO2006091540A2 (en) * 2005-02-22 2006-08-31 Verax Technologies Inc. System and method for formatting multimode sound content and metadata
DE102005008342A1 (en) * 2005-02-23 2006-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio-data files storage device especially for driving a wave-field synthesis rendering device, uses control device for controlling audio data files written on storage device
DE102005008343A1 (en) * 2005-02-23 2006-09-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for providing data in a multi-renderer system
DE102005008366A1 (en) * 2005-02-23 2006-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Device for driving wave-field synthesis rendering device with audio objects, has unit for supplying scene description defining time sequence of audio objects
JP2006304165A (en) * 2005-04-25 2006-11-02 Yamaha Corp Speaker array system
DE102005033238A1 (en) * 2005-07-15 2007-01-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for driving a plurality of loudspeakers by means of a DSP
KR100897971B1 (en) * 2005-07-29 2009-05-18 하르만 인터내셔날 인더스트리즈, 인코포레이티드 Audio tuning system
KR100733965B1 (en) 2005-11-01 2007-06-29 한국전자통신연구원 Object-based audio transmitting/receiving system and method
KR20080093419A (en) * 2006-02-07 2008-10-21 엘지전자 주식회사 Apparatus and method for encoding/decoding signal
EP1843635B1 (en) * 2006-04-05 2010-12-08 Harman Becker Automotive Systems GmbH Method for automatically equalizing a sound system
DE102006022346B4 (en) * 2006-05-12 2008-02-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Information signal coding
US8625808B2 (en) 2006-09-29 2014-01-07 Lg Elecronics Inc. Methods and apparatuses for encoding and decoding object-based audio signals
JP5337941B2 (en) * 2006-10-16 2013-11-06 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ Apparatus and method for multi-channel parameter conversion
CN101001485A (en) * 2006-10-23 2007-07-18 中国传媒大学 Finite sound source multi-channel sound field system and sound field analogy method
JP5270566B2 (en) 2006-12-07 2013-08-21 エルジー エレクトロニクス インコーポレイティド Audio processing method and apparatus
US7788395B2 (en) 2007-02-14 2010-08-31 Microsoft Corporation Adaptive media playback
CN101675472B (en) 2007-03-09 2012-06-20 Lg电子株式会社 A method and an apparatus for processing an audio signal
JP5220840B2 (en) 2007-03-30 2013-06-26 エレクトロニクス アンド テレコミュニケーションズ リサーチ インスチチュート Multi-object audio signal encoding and decoding apparatus and method for multi-channel
EP2158587A4 (en) 2007-06-08 2010-06-02 Lg Electronics Inc A method and an apparatus for processing an audio signal
US8396574B2 (en) 2007-07-13 2013-03-12 Dolby Laboratories Licensing Corporation Audio processing using auditory scene analysis and spectral skewness
WO2009115299A1 (en) * 2008-03-20 2009-09-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. Device and method for acoustic indication
JP5174527B2 (en) 2008-05-14 2013-04-03 日本放送協会 Acoustic signal multiplex transmission system, production apparatus and reproduction apparatus to which sound image localization acoustic meta information is added
US8315396B2 (en) 2008-07-17 2012-11-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating audio output signals using object based metadata
US7996422B2 (en) * 2008-07-22 2011-08-09 At&T Intellectual Property L.L.P. System and method for adaptive media playback based on destination
US7796190B2 (en) 2008-08-15 2010-09-14 At&T Labs, Inc. System and method for adaptive content rendition
US8793749B2 (en) 2008-08-25 2014-07-29 Broadcom Corporation Source frame adaptation and matching optimally to suit a recipient video device
US8798776B2 (en) 2008-09-30 2014-08-05 Dolby International Ab Transcoding of audio metadata
US8351612B2 (en) 2008-12-02 2013-01-08 Electronics And Telecommunications Research Institute Apparatus for generating and playing object based audio contents
US8786682B2 (en) * 2009-03-05 2014-07-22 Primesense Ltd. Reference image techniques for three-dimensional sensing
CN102461208B (en) 2009-06-19 2015-09-23 杜比实验室特许公司 For user's special characteristic of scalable medium kernel and engine
US8136142B2 (en) 2009-07-02 2012-03-13 Ericsson Television, Inc. Centralized content management system for managing distribution of packages to video service providers
US8396575B2 (en) 2009-08-14 2013-03-12 Dts Llc Object-oriented audio streaming system
US9384299B2 (en) * 2009-09-22 2016-07-05 Thwapr, Inc. Receiving content for mobile media sharing
US20110088076A1 (en) * 2009-10-08 2011-04-14 Futurewei Technologies, Inc. System and Method for Media Adaptation
WO2011045813A2 (en) 2009-10-15 2011-04-21 Tony Joy A method and product to transparently deliver audio through fusion of fixed loudspeakers and headphones to deliver the sweet spot experience
BR112012012097B1 (en) * 2009-11-20 2021-01-05 Fraunhofer - Gesellschaft Zur Foerderung Der Angewandten Ten Forschung E.V. apparatus for providing an upmix signal representation based on the downmix signal representation, apparatus for providing a bit stream representing a multichannel audio signal, methods and bit stream representing a multichannel audio signal using a linear combination parameter
EP2507788A4 (en) 2009-12-02 2014-06-18 Thomson Licensing Optimizing content calibration for home theaters
KR102115723B1 (en) * 2011-07-01 2020-05-28 돌비 레버러토리즈 라이쎈싱 코오포레이션 System and method for adaptive audio signal generation, coding and rendering
US20130163794A1 (en) * 2011-12-22 2013-06-27 Motorola Mobility, Inc. Dynamic control of audio on a mobile device with respect to orientation of the mobile device
RS1332U (en) 2013-04-24 2013-08-30 Tomislav Stanojević Total surround sound system with floor loudspeakers
EP2830336A3 (en) * 2013-07-22 2015-03-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Renderer controlled spatial upmix

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None

Cited By (350)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011025437A1 (en) 2009-08-26 2011-03-03 Svenska Utvecklings Entreprenören Susen Ab Method for wakening up a driver of a motor vehicle
WO2013192111A1 (en) 2012-06-19 2013-12-27 Dolby Laboratories Licensing Corporation Rendering and playback of spatial audio using channel-based audio systems
EP3253079A1 (en) 2012-08-31 2017-12-06 Dolby Laboratories Licensing Corp. System for rendering and playback of object based audio in various listening environments
WO2014036121A1 (en) 2012-08-31 2014-03-06 Dolby Laboratories Licensing Corporation System for rendering and playback of object based audio in various listening environments
EP4207817A1 (en) 2012-08-31 2023-07-05 Dolby Laboratories Licensing Corporation System for rendering and playback of object based audio in various listening environments
EP2930952A4 (en) * 2012-12-04 2016-09-14 Samsung Electronics Co Ltd Audio providing apparatus and audio providing method
EP2930952A1 (en) * 2012-12-04 2015-10-14 Samsung Electronics Co., Ltd. Audio providing apparatus and audio providing method
US10341800B2 (en) 2012-12-04 2019-07-02 Samsung Electronics Co., Ltd. Audio providing apparatus and audio providing method
US10149084B2 (en) 2012-12-04 2018-12-04 Samsung Electronics Co., Ltd. Audio providing apparatus and audio providing method
US9774973B2 (en) 2012-12-04 2017-09-26 Samsung Electronics Co., Ltd. Audio providing apparatus and audio providing method
CN109166587B (en) * 2013-01-15 2023-02-03 韩国电子通信研究院 Encoding/decoding apparatus and method for processing channel signal
CN108806706B (en) * 2013-01-15 2022-11-15 韩国电子通信研究院 Encoding/decoding apparatus and method for processing channel signal
KR20210018382A (en) * 2013-01-15 2021-02-17 한국전자통신연구원 Encoding/decoding apparatus and method for controlling multichannel signals
KR102213895B1 (en) * 2013-01-15 2021-02-08 한국전자통신연구원 Encoding/decoding apparatus and method for controlling multichannel signals
KR102322104B1 (en) 2013-01-15 2021-11-05 한국전자통신연구원 Audio signal procsessing apparatus and method for sound bar
KR20210134279A (en) * 2013-01-15 2021-11-09 한국전자통신연구원 Audio signal procsessing apparatus and method for sound bar
KR102357924B1 (en) * 2013-01-15 2022-02-08 한국전자통신연구원 Encoding/decoding apparatus and method for controlling multichannel signals
US10068579B2 (en) 2013-01-15 2018-09-04 Electronics And Telecommunications Research Institute Encoding/decoding apparatus for processing channel signal and method therefor
CN108806706A (en) * 2013-01-15 2018-11-13 韩国电子通信研究院 Handle the coding/decoding device and method of channel signal
KR20220020849A (en) * 2013-01-15 2022-02-21 한국전자통신연구원 Encoding/decoding apparatus and method for controlling multichannel signals
KR20200112774A (en) * 2013-01-15 2020-10-05 한국전자통신연구원 Audio signal procsessing apparatus and method for sound bar
US11289105B2 (en) 2013-01-15 2022-03-29 Electronics And Telecommunications Research Institute Encoding/decoding apparatus for processing channel signal and method therefor
KR102160218B1 (en) * 2013-01-15 2020-09-28 한국전자통신연구원 Audio signal procsessing apparatus and method for sound bar
KR20140092779A (en) * 2013-01-15 2014-07-24 한국전자통신연구원 Encoding/decoding apparatus and method for controlling multichannel signals
CN109166588A (en) * 2013-01-15 2019-01-08 韩国电子通信研究院 Handle the coding/decoding device and method of channel signal
CN109166587A (en) * 2013-01-15 2019-01-08 韩国电子通信研究院 Handle the coding/decoding device and method of channel signal
US11875802B2 (en) 2013-01-15 2024-01-16 Electronics And Telecommunications Research Institute Encoding/decoding apparatus for processing channel signal and method
CN105009207A (en) * 2013-01-15 2015-10-28 韩国电子通信研究院 Encoding/decoding apparatus for processing channel signal and method therefor
KR102458956B1 (en) 2013-01-15 2022-10-26 한국전자통신연구원 Audio signal procsessing apparatus and method for sound bar
US10332532B2 (en) 2013-01-15 2019-06-25 Electronics And Telecommunications Research Institute Encoding/decoding apparatus for processing channel signal and method therefor
WO2014112793A1 (en) * 2013-01-15 2014-07-24 한국전자통신연구원 Encoding/decoding apparatus for processing channel signal and method therefor
KR20140093578A (en) * 2013-01-15 2014-07-28 한국전자통신연구원 Audio signal procsessing apparatus and method for sound bar
CN109166588B (en) * 2013-01-15 2022-11-15 韩国电子通信研究院 Encoding/decoding apparatus and method for processing channel signal
KR102477610B1 (en) * 2013-01-15 2022-12-14 한국전자통신연구원 Encoding/decoding apparatus and method for controlling multichannel signals
US10178489B2 (en) 2013-02-08 2019-01-08 Qualcomm Incorporated Signaling audio rendering information in a bitstream
US9883310B2 (en) 2013-02-08 2018-01-30 Qualcomm Incorporated Obtaining symmetry information for higher order ambisonic audio renderers
US9870778B2 (en) 2013-02-08 2018-01-16 Qualcomm Incorporated Obtaining sparseness information for higher order ambisonic audio renderers
US9609452B2 (en) 2013-02-08 2017-03-28 Qualcomm Incorporated Obtaining sparseness information for higher order ambisonic audio renderers
WO2014124261A1 (en) * 2013-02-08 2014-08-14 Qualcomm Incorporated Signaling audio rendering information in a bitstream
CN104981869A (en) * 2013-02-08 2015-10-14 高通股份有限公司 Signaling audio rendering information in a bitstream
RU2661775C2 (en) * 2013-02-08 2018-07-19 Квэлкомм Инкорпорейтед Transmission of audio rendering signal in bitstream
CN104981869B (en) * 2013-02-08 2019-04-26 高通股份有限公司 Audio spatial cue is indicated with signal in bit stream
JP2016510435A (en) * 2013-02-08 2016-04-07 クゥアルコム・インコーポレイテッドQualcomm Incorporated Signal audio rendering information in a bitstream
JP2016510905A (en) * 2013-03-01 2016-04-11 クゥアルコム・インコーポレイテッドQualcomm Incorporated Specify spherical harmonics and / or higher order ambisonics coefficients in bitstream
EP2974010A4 (en) * 2013-03-15 2016-11-23 Dts Inc Automatic multi-channel music mix from multiple audio stems
CN105075117B (en) * 2013-03-15 2020-02-18 Dts(英属维尔京群岛)有限公司 System and method for automatic multi-channel music mixing based on multiple audio backbones
US9640163B2 (en) 2013-03-15 2017-05-02 Dts, Inc. Automatic multi-channel music mix from multiple audio stems
US11132984B2 (en) 2013-03-15 2021-09-28 Dts, Inc. Automatic multi-channel music mix from multiple audio stems
WO2014151092A1 (en) 2013-03-15 2014-09-25 Dts, Inc. Automatic multi-channel music mix from multiple audio stems
CN105075117A (en) * 2013-03-15 2015-11-18 Dts(英属维尔京群岛)有限公司 Automatic multi-channel music mix from multiple audio stems
US9674630B2 (en) 2013-03-28 2017-06-06 Dolby Laboratories Licensing Corporation Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
JP2016511990A (en) * 2013-03-28 2016-04-21 ドルビー ラボラトリーズ ライセンシング コーポレイション Render audio objects with an apparent size to any loudspeaker layout
JP5897778B1 (en) * 2013-03-28 2016-03-30 ドルビー ラボラトリーズ ライセンシング コーポレイション Render audio objects with an apparent size to any loudspeaker layout
RU2630955C9 (en) * 2013-03-28 2017-09-29 Долби Лабораторис Лайсэнзин Корпорейшн Presentation of audio object data with caused size in loudspeaker location arrangements
WO2014160717A1 (en) * 2013-03-28 2014-10-02 Dolby Laboratories Licensing Corporation Using single bitstream to produce tailored audio device mixes
RU2630955C2 (en) * 2013-03-28 2017-09-14 Долби Лабораторис Лайсэнзин Корпорейшн Presentation of audio object data with caused size in loudspeaker location arrangements
EP3282716A1 (en) * 2013-03-28 2018-02-14 Dolby Laboratories Licensing Corp. Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
RU2742195C2 (en) * 2013-03-28 2021-02-03 Долби Лабораторис Лайсэнзин Корпорейшн Presenting audio object data with apparent size into random arrangement patterns of loudspeakers
US11564051B2 (en) 2013-03-28 2023-01-24 Dolby Laboratories Licensing Corporation Methods and apparatus for rendering audio objects
WO2014159272A1 (en) * 2013-03-28 2014-10-02 Dolby Laboratories Licensing Corporation Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
US9992600B2 (en) 2013-03-28 2018-06-05 Dolby Laboratories Licensing Corporation Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
EP3668121A1 (en) * 2013-03-28 2020-06-17 Dolby Laboratories Licensing Corp. Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
JP2016146642A (en) * 2013-03-28 2016-08-12 ドルビー ラボラトリーズ ライセンシング コーポレイション Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
US10652684B2 (en) 2013-03-28 2020-05-12 Dolby Laboratories Licensing Corporation Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
US11019447B2 (en) 2013-03-28 2021-05-25 Dolby Laboratories Licensing Corporation Rendering of audio objects with apparent size to arbitrary loudspeaker layouts
US9900720B2 (en) 2013-03-28 2018-02-20 Dolby Laboratories Licensing Corporation Using single bitstream to produce tailored audio device mixes
JP2016521380A (en) * 2013-04-03 2016-07-21 ドルビー ラボラトリーズ ライセンシング コーポレイション Method and system for generating and rendering object-based audio with conditional rendering metadata
US10832690B2 (en) 2013-04-03 2020-11-10 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
CN107731239A (en) * 2013-04-03 2018-02-23 杜比实验室特许公司 For generating and interactively rendering the method and system of object-based audio
US11727945B2 (en) 2013-04-03 2023-08-15 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
JP2016520858A (en) * 2013-04-03 2016-07-14 ドルビー ラボラトリーズ ライセンシング コーポレイション Method and system for generating and interactively rendering object-based audio
US11568881B2 (en) 2013-04-03 2023-01-31 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
JP2016519788A (en) * 2013-04-03 2016-07-07 ドルビー ラボラトリーズ ライセンシング コーポレイション Method and system for interactive rendering of object-based audio
US9881622B2 (en) 2013-04-03 2018-01-30 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US10276172B2 (en) 2013-04-03 2019-04-30 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
US11081118B2 (en) 2013-04-03 2021-08-03 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US11270713B2 (en) 2013-04-03 2022-03-08 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US10388291B2 (en) 2013-04-03 2019-08-20 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US10748547B2 (en) 2013-04-03 2020-08-18 Dolby Laboratories Licensing Corporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US10515644B2 (en) 2013-04-03 2019-12-24 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US10553225B2 (en) 2013-04-03 2020-02-04 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US11769514B2 (en) 2013-04-03 2023-09-26 Dolby Laboratories Licensing Corporation Methods and systems for rendering object based audio
US9997164B2 (en) 2013-04-03 2018-06-12 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US11948586B2 (en) 2013-04-03 2024-04-02 Dolby Laboratories Licensing Coporation Methods and systems for generating and rendering object based audio with conditional rendering metadata
US9805727B2 (en) 2013-04-03 2017-10-31 Dolby Laboratories Licensing Corporation Methods and systems for generating and interactively rendering object based audio
JP2014204316A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204323A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204321A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204320A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
JP2014204322A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
US20160050508A1 (en) * 2013-04-05 2016-02-18 William Gebbens REDMANN Method for managing reverberant field for immersive audio
JP2014204317A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
WO2014163657A1 (en) * 2013-04-05 2014-10-09 Thomson Licensing Method for managing reverberant field for immersive audio
RU2655994C2 (en) * 2013-04-26 2018-05-30 Сони Корпорейшн Audio processing device and audio processing system
WO2014177202A1 (en) * 2013-04-30 2014-11-06 Huawei Technologies Co., Ltd. Audio signal processing apparatus
RU2667630C2 (en) * 2013-05-16 2018-09-21 Конинклейке Филипс Н.В. Device for audio processing and method therefor
WO2014184706A1 (en) * 2013-05-16 2014-11-20 Koninklijke Philips N.V. An audio apparatus and method therefor
US11503424B2 (en) 2013-05-16 2022-11-15 Koninklijke Philips N.V. Audio processing apparatus and method therefor
US11743673B2 (en) 2013-05-16 2023-08-29 Koninklijke Philips N.V. Audio processing apparatus and method therefor
US11197120B2 (en) 2013-05-16 2021-12-07 Koninklijke Philips N.V. Audio processing apparatus and method therefor
CN105191354A (en) * 2013-05-16 2015-12-23 皇家飞利浦有限公司 An audio processing apparatus and method therefor
CN105247894B (en) * 2013-05-16 2017-11-07 皇家飞利浦有限公司 Audio devices and its method
CN105247894A (en) * 2013-05-16 2016-01-13 皇家飞利浦有限公司 Audio apparatus and method therefor
RU2671627C2 (en) * 2013-05-16 2018-11-02 Конинклейке Филипс Н.В. Audio apparatus and method therefor
WO2014184353A1 (en) 2013-05-16 2014-11-20 Koninklijke Philips N.V. An audio processing apparatus and method therefor
US10582330B2 (en) 2013-05-16 2020-03-03 Koninklijke Philips N.V. Audio processing apparatus and method therefor
JP2016521532A (en) * 2013-05-16 2016-07-21 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Audio processing apparatus and method
US9860669B2 (en) 2013-05-16 2018-01-02 Koninklijke Philips N.V. Audio apparatus and method therefor
EP2997573A4 (en) * 2013-05-17 2017-01-18 Nokia Technologies OY Spatial object oriented audio apparatus
US9706324B2 (en) 2013-05-17 2017-07-11 Nokia Technologies Oy Spatial object oriented audio apparatus
EP3007168A4 (en) * 2013-05-31 2017-01-25 Sony Corporation Encoding device and method, decoding device and method, and program
KR101410976B1 (en) 2013-05-31 2014-06-23 한국산업은행 Apparatus and method for positioning of speaker
JP6022685B2 (en) * 2013-06-10 2016-11-09 株式会社ソシオネクスト Audio playback apparatus and method
JPWO2014199536A1 (en) * 2013-06-10 2017-02-23 株式会社ソシオネクスト Audio playback apparatus and method
US9788120B2 (en) 2013-06-10 2017-10-10 Socionext Inc. Audio playback device and audio playback method
US9723425B2 (en) 2013-06-18 2017-08-01 Dolby Laboratories Licensing Corporation Bass management for audio rendering
CN104240711A (en) * 2013-06-18 2014-12-24 杜比实验室特许公司 Self-adaptive audio frequency content generation
EP3474575A1 (en) * 2013-06-18 2019-04-24 Dolby Laboratories Licensing Corporation Bass management for audio rendering
EP3011762B1 (en) * 2013-06-18 2020-04-22 Dolby Laboratories Licensing Corporation Adaptive audio content generation
CN105340300B (en) * 2013-06-18 2018-04-13 杜比实验室特许公司 The bass management presented for audio
WO2014204911A1 (en) * 2013-06-18 2014-12-24 Dolby Laboratories Licensing Corporation Bass management for audio rendering
CN105340300A (en) * 2013-06-18 2016-02-17 杜比实验室特许公司 Bass management for audio rendering
JP2016507088A (en) * 2013-06-19 2016-03-07 ドルビー ラボラトリーズ ライセンシング コーポレイション Audio encoder and decoder with program information or substream structure metadata
US10147436B2 (en) 2013-06-19 2018-12-04 Dolby Laboratories Licensing Corporation Audio encoder and decoder with program information or substream structure metadata
US11404071B2 (en) 2013-06-19 2022-08-02 Dolby Laboratories Licensing Corporation Audio encoder and decoder with dynamic range compression metadata
US11823693B2 (en) 2013-06-19 2023-11-21 Dolby Laboratories Licensing Corporation Audio encoder and decoder with dynamic range compression metadata
US9959878B2 (en) 2013-06-19 2018-05-01 Dolby Laboratories Licensing Corporation Audio encoder and decoder with dynamic range compression metadata
US10037763B2 (en) 2013-06-19 2018-07-31 Dolby Laboratories Licensing Corporation Audio encoder and decoder with program information or substream structure metadata
US9883311B2 (en) 2013-06-28 2018-01-30 Dolby Laboratories Licensing Corporation Rendering of audio objects using discontinuous rendering-matrix updates
WO2014209902A1 (en) * 2013-06-28 2014-12-31 Dolby Laboratories Licensing Corporation Improved rendering of audio objects using discontinuous rendering-matrix updates
CN110827810A (en) * 2013-07-04 2020-02-21 三星电子株式会社 Apparatus and method for recognizing speech and text
US9858932B2 (en) 2013-07-08 2018-01-02 Dolby Laboratories Licensing Corporation Processing of time-varying metadata for lossless resampling
WO2015006112A1 (en) 2013-07-08 2015-01-15 Dolby Laboratories Licensing Corporation Processing of time-varying metadata for lossless resampling
EP3133840A1 (en) * 2013-07-22 2017-02-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and signal processing unit for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
RU2672386C1 (en) * 2013-07-22 2018-11-14 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method for conversion of first and second input channels at least in one output channel
US11337019B2 (en) 2013-07-22 2022-05-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
TWI560699B (en) * 2013-07-22 2016-12-01 Fraunhofer Ges Forschung Apparatus and method for efficient object metadata coding
US9936327B2 (en) 2013-07-22 2018-04-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and signal processing unit for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
EP2830047A1 (en) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for low delay object metadata coding
US11910176B2 (en) 2013-07-22 2024-02-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
EP3518563B1 (en) * 2013-07-22 2022-05-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for mapping first and second input channels to at least one output channel
US11877141B2 (en) 2013-07-22 2024-01-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and signal processing unit for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
RU2640647C2 (en) * 2013-07-22 2018-01-10 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method of transforming first and second input channels, at least, in one output channel
EP3258710A1 (en) * 2013-07-22 2017-12-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for mapping first and second input channels to at least one output channel
EP2830049A1 (en) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for efficient object metadata coding
US11463831B2 (en) 2013-07-22 2022-10-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for efficient object metadata coding
US11330386B2 (en) 2013-07-22 2022-05-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for realizing a SAOC downmix of 3D audio content
RU2635903C2 (en) * 2013-07-22 2017-11-16 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Method and signal processor for converting plurality of input channels from configuration of input channels to output channels from configuration of output channels
US10277998B2 (en) 2013-07-22 2019-04-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
CN111883148A (en) * 2013-07-22 2020-11-03 弗朗霍夫应用科学研究促进协会 Apparatus and method for low latency object metadata encoding
US10798512B2 (en) 2013-07-22 2020-10-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and signal processing unit for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
US9578435B2 (en) 2013-07-22 2017-02-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for enhanced spatial audio object coding
TWI562652B (en) * 2013-07-22 2016-12-11 Fraunhofer Ges Forschung Method and signal processing unit for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
US20160133267A1 (en) * 2013-07-22 2016-05-12 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects
WO2015011000A1 (en) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for efficient object metadata coding
RU2666282C2 (en) * 2013-07-22 2018-09-06 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Apparatus and method for efficient object metadata coding
US9699584B2 (en) 2013-07-22 2017-07-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for realizing a SAOC downmix of 3D audio content
WO2015010962A3 (en) * 2013-07-22 2015-03-26 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method, signal processing unit, and computer program for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
WO2015010996A1 (en) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for low delay object metadata coding
US10715943B2 (en) 2013-07-22 2020-07-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for efficient object metadata coding
US10249311B2 (en) 2013-07-22 2019-04-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects
US9788136B2 (en) 2013-07-22 2017-10-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
US11272309B2 (en) 2013-07-22 2022-03-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for mapping first and second input channels to at least one output channel
US10701504B2 (en) 2013-07-22 2020-06-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for realizing a SAOC downmix of 3D audio content
RU2672175C2 (en) * 2013-07-22 2018-11-12 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Apparatus and method for low delay object metadata coding
US10701507B2 (en) 2013-07-22 2020-06-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for mapping first and second input channels to at least one output channel
WO2015010961A3 (en) * 2013-07-22 2015-03-26 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method, and computer program for mapping first and second input channels to at least one output channel
CN105474310A (en) * 2013-07-22 2016-04-06 弗朗霍夫应用科学研究促进协会 Apparatus and method for low delay object metadata coding
US9743210B2 (en) 2013-07-22 2017-08-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for efficient object metadata coding
EP2830332A3 (en) * 2013-07-22 2015-03-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method, signal processing unit, and computer program for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
US10154362B2 (en) 2013-07-22 2018-12-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for mapping first and second input channels to at least one output channel
CN105556991A (en) * 2013-07-22 2016-05-04 弗朗霍夫应用科学研究促进协会 Method, signal processing unit, and computer program for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
US11227616B2 (en) 2013-07-22 2022-01-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects
US10659900B2 (en) 2013-07-22 2020-05-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
CN105556991B (en) * 2013-07-22 2017-07-11 弗朗霍夫应用科学研究促进协会 Multiple input sound channels that input sound channel is configured map to the method and signal processing unit of the output channels of output channels configuration
US9338573B2 (en) 2013-07-30 2016-05-10 Dts, Inc. Matrix decoder with constant-power pairwise panning
US9712939B2 (en) 2013-07-30 2017-07-18 Dolby Laboratories Licensing Corporation Panning of audio objects to arbitrary speaker layouts
US10075797B2 (en) 2013-07-30 2018-09-11 Dts, Inc. Matrix decoder with constant-power pairwise panning
KR101681529B1 (en) 2013-07-31 2016-12-01 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
KR102484214B1 (en) 2013-07-31 2023-01-04 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
JP2018174590A (en) * 2013-07-31 2018-11-08 ドルビー ラボラトリーズ ライセンシング コーポレイション Processing of spatially spread or large audio object
KR102327504B1 (en) * 2013-07-31 2021-11-17 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
WO2015017235A1 (en) * 2013-07-31 2015-02-05 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
US10595152B2 (en) 2013-07-31 2020-03-17 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
RU2716037C2 (en) * 2013-07-31 2020-03-05 Долби Лэборетериз Лайсенсинг Корпорейшн Processing of spatially-diffuse or large sound objects
KR20210141766A (en) * 2013-07-31 2021-11-23 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
KR20160140971A (en) * 2013-07-31 2016-12-07 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
US11736890B2 (en) 2013-07-31 2023-08-22 Dolby Laboratories Licensing Corporation Method, apparatus or systems for processing audio objects
KR20160021892A (en) * 2013-07-31 2016-02-26 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
JP7116144B2 (en) 2013-07-31 2022-08-09 ドルビー ラボラトリーズ ライセンシング コーポレイション Processing spatially diffuse or large audio objects
US11064310B2 (en) 2013-07-31 2021-07-13 Dolby Laboratories Licensing Corporation Method, apparatus or systems for processing audio objects
KR102395351B1 (en) 2013-07-31 2022-05-10 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
US9654895B2 (en) 2013-07-31 2017-05-16 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
CN110797037A (en) * 2013-07-31 2020-02-14 杜比实验室特许公司 Method and apparatus for processing audio data, medium, and device
US10003907B2 (en) 2013-07-31 2018-06-19 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
JP2021036729A (en) * 2013-07-31 2021-03-04 ドルビー ラボラトリーズ ライセンシング コーポレイション Processing of spatially spread or large audio object
KR20220061284A (en) * 2013-07-31 2022-05-12 돌비 레버러토리즈 라이쎈싱 코오포레이션 Processing spatially diffuse or large audio objects
EP3564951A1 (en) * 2013-07-31 2019-11-06 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
RU2646344C2 (en) * 2013-07-31 2018-03-02 Долби Лэборетериз Лайсенсинг Корпорейшн Processing of spatially diffuse or large sound objects
US10354359B2 (en) 2013-08-21 2019-07-16 Interdigital Ce Patent Holdings Video display with pan function controlled by viewing direction
US9483228B2 (en) 2013-08-26 2016-11-01 Dolby Laboratories Licensing Corporation Live engine
EP3039674A4 (en) * 2013-08-28 2017-06-07 Landr Audio Inc. System and method for performing automatic audio production using semantic data
US11429341B2 (en) 2013-09-12 2022-08-30 Dolby International Ab Dynamic range control for a wide variety of playback environments
US10956121B2 (en) 2013-09-12 2021-03-23 Dolby Laboratories Licensing Corporation Dynamic range control for a wide variety of playback environments
US11842122B2 (en) 2013-09-12 2023-12-12 Dolby Laboratories Licensing Corporation Dynamic range control for a wide variety of playback environments
BE1022233B1 (en) * 2013-09-27 2016-03-03 James A Cashin SECURE SYSTEM AND METHOD FOR PROCESSING AUDIO SOUND
US10002616B2 (en) 2013-10-17 2018-06-19 Socionext Inc. Audio decoding device
EP3059732A4 (en) * 2013-10-17 2017-04-19 Socionext Inc. Audio encoding device and audio decoding device
US9779740B2 (en) 2013-10-17 2017-10-03 Socionext Inc. Audio encoding device and audio decoding device
EP3672285A1 (en) * 2013-10-31 2020-06-24 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US11681490B2 (en) 2013-10-31 2023-06-20 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
CN113630711A (en) * 2013-10-31 2021-11-09 杜比实验室特许公司 Binaural rendering of headphones using metadata processing
US9933989B2 (en) 2013-10-31 2018-04-03 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
CN109040946A (en) * 2013-10-31 2018-12-18 杜比实验室特许公司 The ears of the earphone handled using metadata are presented
CN108712711B (en) * 2013-10-31 2021-06-15 杜比实验室特许公司 Binaural rendering of headphones using metadata processing
WO2015066062A1 (en) * 2013-10-31 2015-05-07 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US10503461B2 (en) 2013-10-31 2019-12-10 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US10255027B2 (en) 2013-10-31 2019-04-09 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US11269586B2 (en) 2013-10-31 2022-03-08 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
US10838684B2 (en) 2013-10-31 2020-11-17 Dolby Laboratories Licensing Corporation Binaural rendering for headphones using metadata processing
CN108712711A (en) * 2013-10-31 2018-10-26 杜比实验室特许公司 The ears of the earphone handled using metadata are presented
CN109040946B (en) * 2013-10-31 2021-09-14 杜比实验室特许公司 Binaural rendering of headphones using metadata processing
CN109068263A (en) * 2013-10-31 2018-12-21 杜比实验室特许公司 The ears of the earphone handled using metadata are presented
CN113630711B (en) * 2013-10-31 2023-12-01 杜比实验室特许公司 Binaural rendering of headphones using metadata processing
CN105684467A (en) * 2013-10-31 2016-06-15 杜比实验室特许公司 Binaural rendering for headphones using metadata processing
US9813837B2 (en) 2013-11-14 2017-11-07 Dolby Laboratories Licensing Corporation Screen-relative rendering of audio and encoding and decoding of audio for such rendering
JP2017503375A (en) * 2013-11-14 2017-01-26 ドルビー ラボラトリーズ ライセンシング コーポレイション Audio versus screen rendering and audio encoding and decoding for such rendering
US9552819B2 (en) 2013-11-27 2017-01-24 Dts, Inc. Multiplet-based matrix mixing for high-channel count multichannel audio
CN105981411A (en) * 2013-11-27 2016-09-28 Dts(英属维尔京群岛)有限公司 Multiplet-based matrix mixing for high-channel count multichannel audio
WO2015081293A1 (en) * 2013-11-27 2015-06-04 Dts, Inc. Multiplet-based matrix mixing for high-channel count multichannel audio
US11743674B2 (en) 2013-11-28 2023-08-29 Dolby International Ab Methods, apparatus and systems for position-based gain adjustment of object-based audio
US10034117B2 (en) 2013-11-28 2018-07-24 Dolby Laboratories Licensing Corporation Position-based gain adjustment of object-based audio and ring-based channel audio
US10631116B2 (en) 2013-11-28 2020-04-21 Dolby Laboratories Licensing Corporation Position-based gain adjustment of object-based audio and ring-based channel audio
US11115776B2 (en) 2013-11-28 2021-09-07 Dolby Laboratories Licensing Corporation Methods, apparatus and systems for position-based gain adjustment of object-based audio
EP3618460A1 (en) * 2014-01-07 2020-03-04 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Apparatus and method for generating a plurality of audio channels
US9729995B2 (en) 2014-01-07 2017-08-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
US11785414B2 (en) 2014-01-07 2023-10-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Apparatus and method for generating a plurality of audio channels
US20170318408A1 (en) * 2014-01-07 2017-11-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
RU2676948C2 (en) * 2014-01-07 2019-01-11 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Apparatus and method for generating plurality of audio channels
US10904693B2 (en) 2014-01-07 2021-01-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
TWI558231B (en) * 2014-01-07 2016-11-11 弗勞恩霍夫爾協會 Apparatus and method for generating a plurality of audio channels
US11438723B2 (en) 2014-01-07 2022-09-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
WO2015104237A1 (en) * 2014-01-07 2015-07-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating a plurality of audio channels
EP2892250A1 (en) * 2014-01-07 2015-07-08 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating a plurality of audio channels
US10097945B2 (en) 2014-01-07 2018-10-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
US10595153B2 (en) 2014-01-07 2020-03-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a plurality of audio channels
CN105934955A (en) * 2014-01-07 2016-09-07 弗劳恩霍夫应用研究促进协会 Apparatus and method for generating a plurality of audio channels
WO2015126814A3 (en) * 2014-02-20 2015-10-15 Bose Corporation Content-aware audio modes
US9578436B2 (en) 2014-02-20 2017-02-21 Bose Corporation Content-aware audio modes
US10567899B2 (en) 2014-03-24 2020-02-18 Dolby Laboratories Licensing Corporation Method and device for applying dynamic range compression to a higher order ambisonics signal
US10638244B2 (en) 2014-03-24 2020-04-28 Dolby Laboratories Licensing Corporation Method and device for applying dynamic range compression to a higher order ambisonics signal
US10362424B2 (en) 2014-03-24 2019-07-23 Dolby Laboratories Licensing Corporation Method and device for applying dynamic range compression to a higher order ambisonics signal
EP4273857A3 (en) * 2014-03-24 2024-01-17 Dolby International AB Method and device for applying dynamic range compression to a higher order ambisonics signal
EP3451706A1 (en) * 2014-03-24 2019-03-06 Dolby International AB Method and device for applying dynamic range compression to a higher order ambisonics signal
US10893372B2 (en) 2014-03-24 2021-01-12 Dolby Laboratories Licensing Corporation Method and device for applying dynamic range compression to a higher order ambisonics signal
US11838738B2 (en) 2014-03-24 2023-12-05 Dolby Laboratories Licensing Corporation Method and device for applying Dynamic Range Compression to a Higher Order Ambisonics signal
JP2015195545A (en) * 2014-03-25 2015-11-05 日本放送協会 Channel number converter
JP2020182227A (en) * 2014-03-26 2020-11-05 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Device and method for screen-related audio object mapping
EP3487189A1 (en) * 2014-03-26 2019-05-22 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Apparatus and method for screen related audio object remapping
US11900955B2 (en) 2014-03-26 2024-02-13 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for screen related audio object remapping
CN106463128A (en) * 2014-03-26 2017-02-22 弗劳恩霍夫应用研究促进协会 Apparatus and method for screen related audio object remapping
WO2015144766A1 (en) * 2014-03-26 2015-10-01 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for screen related audio object remapping
US11527254B2 (en) 2014-03-26 2022-12-13 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for screen related audio object remapping
EP4254988A3 (en) * 2014-03-26 2023-11-01 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for screen related audio object remapping
JP2017513390A (en) * 2014-03-26 2017-05-25 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Apparatus and method for screen-related audio object remapping
US10854213B2 (en) 2014-03-26 2020-12-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for screen related audio object remapping
EP2928216A1 (en) * 2014-03-26 2015-10-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for screen related audio object remapping
AU2015238354B2 (en) * 2014-03-26 2018-11-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for screen related audio object remapping
US10192563B2 (en) 2014-03-26 2019-01-29 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for screen related audio object remapping
RU2683380C2 (en) * 2014-03-26 2019-03-28 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method for repeated display of screen-related audio objects
CN106463128B (en) * 2014-03-26 2020-02-21 弗劳恩霍夫应用研究促进协会 Apparatus and method for screen-dependent audio object remapping
US11785407B2 (en) 2014-04-11 2023-10-10 Samsung Electronics Co., Ltd. Method and apparatus for rendering sound signal, and computer-readable recording medium
US11245998B2 (en) 2014-04-11 2022-02-08 Samsung Electronics Co.. Ltd. Method and apparatus for rendering sound signal, and computer-readable recording medium
US10674299B2 (en) 2014-04-11 2020-06-02 Samsung Electronics Co., Ltd. Method and apparatus for rendering sound signal, and computer-readable recording medium
US10873822B2 (en) 2014-04-11 2020-12-22 Samsung Electronics Co., Ltd. Method and apparatus for rendering sound signal, and computer-readable recording medium
US10068577B2 (en) 2014-04-25 2018-09-04 Dolby Laboratories Licensing Corporation Audio segmentation based on spatial metadata
EP4177886A1 (en) * 2014-05-30 2023-05-10 Sony Corporation Information processing apparatus and information processing method
JPWO2015182491A1 (en) * 2014-05-30 2017-04-20 ソニー株式会社 Information processing apparatus and information processing method
EP3151240A4 (en) * 2014-05-30 2018-01-24 Sony Corporation Information processing device and information processing method
JPWO2015186535A1 (en) * 2014-06-06 2017-04-20 ソニー株式会社 Audio signal processing apparatus and method, encoding apparatus and method, and program
EP3154279A4 (en) * 2014-06-06 2017-11-01 Sony Corporation Audio signal processing apparatus and method, encoding apparatus and method, and program
JP7080007B2 (en) 2014-06-30 2022-06-03 ソニーグループ株式会社 Information processing equipment and information processing method
JPWO2016002738A1 (en) * 2014-06-30 2017-05-25 ソニー株式会社 Information processing apparatus and information processing method
US10349197B2 (en) 2014-08-13 2019-07-09 Samsung Electronics Co., Ltd. Method and device for generating and playing back audio signal
EP3197182A4 (en) * 2014-08-13 2018-04-18 Samsung Electronics Co., Ltd. Method and device for generating and playing back audio signal
JP7310849B2 (en) 2014-09-30 2023-07-19 ソニーグループ株式会社 Receiving device and receiving method
JPWO2016052191A1 (en) * 2014-09-30 2017-07-20 ソニー株式会社 Transmitting apparatus, transmitting method, receiving apparatus, and receiving method
JP2021105735A (en) * 2014-09-30 2021-07-26 ソニーグループ株式会社 Receiver and reception method
US10856042B2 (en) 2014-09-30 2020-12-01 Sony Corporation Transmission apparatus, transmission method, reception apparatus and reception method for transmitting a plurality of types of audio data items
US11871078B2 (en) 2014-09-30 2024-01-09 Sony Corporation Transmission method, reception apparatus and reception method for transmitting a plurality of types of audio data items
WO2016050900A1 (en) * 2014-10-03 2016-04-07 Dolby International Ab Smart access to personalized audio
JP2018185882A (en) * 2014-10-03 2018-11-22 ドルビー・インターナショナル・アーベー Smart access to personalized audio
JP2019207435A (en) * 2014-10-03 2019-12-05 ドルビー・インターナショナル・アーベー Smart access to personalized audio
JP7213861B2 (en) 2014-10-03 2023-01-27 ドルビー・インターナショナル・アーベー Smart access to personalized audio
US10650833B2 (en) 2014-10-03 2020-05-12 Dolby International Ab Methods, apparatus and system for rendering an audio program
US11948585B2 (en) 2014-10-03 2024-04-02 Dolby International Ab Methods, apparatus and system for rendering an audio program
JP2021064949A (en) * 2014-10-03 2021-04-22 ドルビー・インターナショナル・アーベー Smart access to personalized audio
EP3786955A1 (en) * 2014-10-03 2021-03-03 Dolby International AB Smart access to personalized audio
US10089991B2 (en) 2014-10-03 2018-10-02 Dolby International Ab Smart access to personalized audio
JP2018502411A (en) * 2014-10-03 2018-01-25 ドルビー・インターナショナル・アーベー Smart access to personalized audio
EP4216217A1 (en) * 2014-10-03 2023-07-26 Dolby International AB Smart access to personalized audio
US11437048B2 (en) 2014-10-03 2022-09-06 Dolby International Ab Methods, apparatus and system for rendering an audio program
US11937064B2 (en) 2014-12-11 2024-03-19 Dolby Laboratories Licensing Corporation Metadata-preserved audio object clustering
US11363398B2 (en) * 2014-12-11 2022-06-14 Dolby Laboratories Licensing Corporation Metadata-preserved audio object clustering
US10567185B2 (en) 2015-02-03 2020-02-18 Dolby Laboratories Licensing Corporation Post-conference playback system having higher perceived quality than originally heard in the conference
US10057707B2 (en) 2015-02-03 2018-08-21 Dolby Laboratories Licensing Corporation Optimized virtual scene layout for spatial meeting playback
US10257636B2 (en) 2015-04-21 2019-04-09 Dolby Laboratories Licensing Corporation Spatial audio signal manipulation
US11943605B2 (en) 2015-04-21 2024-03-26 Dolby Laboratories Licensing Corporation Spatial audio signal manipulation
US11277707B2 (en) 2015-04-21 2022-03-15 Dolby Laboratories Licensing Corporation Spatial audio signal manipulation
US10728687B2 (en) 2015-04-21 2020-07-28 Dolby Laboratories Licensing Corporation Spatial audio signal manipulation
US10623877B2 (en) 2015-05-14 2020-04-14 Dolby Laboratories Licensing Corporation Generation and playback of near-field audio content
US10063985B2 (en) 2015-05-14 2018-08-28 Dolby Laboratories Licensing Corporation Generation and playback of near-field audio content
US10397720B2 (en) 2015-05-14 2019-08-27 Dolby Laboratories Licensing Corporation Generation and playback of near-field audio content
TWI664623B (en) * 2015-06-17 2019-07-01 弗勞恩霍夫爾協會 Audio processor and corresponding method for loudness control for user interactivity in audio coding systems, and audio encoder
US11379178B2 (en) 2015-06-17 2022-07-05 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Loudness control for user interactivity in audio coding systems
WO2016202682A1 (en) * 2015-06-17 2016-12-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Loudness control for user interactivity in audio coding systems
RU2685999C1 (en) * 2015-06-17 2019-04-23 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Volume control for user interactivity in the audio coding systems
US10838687B2 (en) 2015-06-17 2020-11-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Loudness control for user interactivity in audio coding systems
US10394520B2 (en) 2015-06-17 2019-08-27 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Loudness control for user interactivity in audio coding systems
EP4156180A1 (en) 2015-06-17 2023-03-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Loudness control for user interactivity in audio coding systems
US11170796B2 (en) 2015-06-19 2021-11-09 Sony Corporation Multiple metadata part-based encoding apparatus, encoding method, decoding apparatus, decoding method, and program
KR102124547B1 (en) * 2015-07-31 2020-06-18 애플 인크. Encoded audio metadata-based equalization
KR102178231B1 (en) 2015-07-31 2020-11-12 애플 인크. Encoded audio metadata-based equalization
US9934790B2 (en) 2015-07-31 2018-04-03 Apple Inc. Encoded audio metadata-based equalization
US10699726B2 (en) 2015-07-31 2020-06-30 Apple Inc. Encoded audio metadata-based equalization
KR20200074243A (en) * 2015-07-31 2020-06-24 애플 인크. Encoded audio metadata-based equalization
WO2017023423A1 (en) * 2015-07-31 2017-02-09 Apple Inc. Encoded audio metadata-based equalization
EP4290888A3 (en) * 2015-07-31 2024-02-21 Apple Inc. Encoded audio metadata-based equalization
KR20180020295A (en) * 2015-07-31 2018-02-27 애플 인크. Encoded audio metadata-based equalization
US10425764B2 (en) 2015-08-14 2019-09-24 Dts, Inc. Bass management for object-based audio
US10341770B2 (en) 2015-09-30 2019-07-02 Apple Inc. Encoded audio metadata-based loudness equalization and dynamic equalization during DRC
US10251007B2 (en) 2015-11-20 2019-04-02 Dolby Laboratories Licensing Corporation System and method for rendering an audio program
GB2550877A (en) * 2016-05-26 2017-12-06 Univ Surrey Object-based audio rendering
US10764709B2 (en) 2017-01-13 2020-09-01 Dolby Laboratories Licensing Corporation Methods, apparatus and systems for dynamic equalization for cross-talk cancellation
EP3358862A1 (en) * 2017-02-06 2018-08-08 Visteon Global Technologies, Inc. Method and device for stereophonic depiction of virtual noise sources in a vehicle
CN111164679A (en) * 2017-10-05 2020-05-15 索尼公司 Encoding device and method, decoding device and method, and program
CN111164679B (en) * 2017-10-05 2024-04-09 索尼公司 Encoding device and method, decoding device and method, and program
US11595056B2 (en) 2017-10-05 2023-02-28 Sony Corporation Encoding device and method, decoding device and method, and program
EP3693961A4 (en) * 2017-10-05 2020-11-11 Sony Corporation Encoding device and method, decoding device and method, and program
WO2019158750A1 (en) * 2018-02-19 2019-08-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for object-based spatial audio-mastering
EP3719789A1 (en) * 2019-04-03 2020-10-07 Yamaha Corporation Sound signal processor and sound signal processing method
US11089422B2 (en) 2019-04-03 2021-08-10 Yamaha Corporation Sound signal processor and sound signal processing method
WO2021003351A1 (en) * 2019-07-03 2021-01-07 Qualcomm Incorporated Adapting audio streams for rendering
WO2021003397A1 (en) * 2019-07-03 2021-01-07 Qualcomm Incorporated Password-based authorization for audio rendering
US11580213B2 (en) 2019-07-03 2023-02-14 Qualcomm Incorporated Password-based authorization for audio rendering
US10972852B2 (en) 2019-07-03 2021-04-06 Qualcomm Incorporated Adapting audio streams for rendering
EP4002870A4 (en) * 2019-07-19 2022-09-28 Sony Group Corporation Signal processing device and method, and program
JP2022553111A (en) * 2019-12-02 2022-12-21 ドルビー ラボラトリーズ ライセンシング コーポレイション System, method, and apparatus for conversion of channel-based audio to object-based audio
JP7182751B6 (en) 2019-12-02 2022-12-20 ドルビー ラボラトリーズ ライセンシング コーポレイション System, method, and apparatus for conversion of channel-based audio to object-based audio
JP7182751B1 (en) 2019-12-02 2022-12-02 ドルビー ラボラトリーズ ライセンシング コーポレイション System, method, and apparatus for conversion of channel-based audio to object-based audio

Also Published As

Publication number Publication date
TWI651005B (en) 2019-02-11
JP6523585B1 (en) 2019-06-05
KR20220081385A (en) 2022-06-15
IL302167A (en) 2023-06-01
IL277736A (en) 2020-11-30
EP3893521A1 (en) 2021-10-13
JP7009664B2 (en) 2022-01-25
US20200145779A1 (en) 2020-05-07
US10904692B2 (en) 2021-01-26
KR20230170110A (en) 2023-12-18
KR102003191B1 (en) 2019-07-24
TW201811070A (en) 2018-03-16
JP2021073496A (en) 2021-05-13
KR102608968B1 (en) 2023-12-05
KR101946795B1 (en) 2019-02-13
KR102115723B1 (en) 2020-05-28
KR20150013913A (en) 2015-02-05
JP2019144583A (en) 2019-08-29
US20170215020A1 (en) 2017-07-27
CA3157717A1 (en) 2013-01-10
CN105792086B (en) 2019-02-15
IL265741B (en) 2020-10-29
JP2017215592A (en) 2017-12-07
AU2016202227B2 (en) 2018-03-22
IL284585B (en) 2022-04-01
KR20180035937A (en) 2018-04-06
AU2019204012B2 (en) 2020-06-11
BR112013033386B1 (en) 2021-05-04
TWI792203B (en) 2023-02-11
US20180192230A1 (en) 2018-07-05
AU2018203734A1 (en) 2018-06-21
BR112013033386A2 (en) 2017-01-24
KR20190014601A (en) 2019-02-12
AU2020226984B2 (en) 2021-08-19
IL245574A0 (en) 2016-06-30
MY165933A (en) 2018-05-18
JP5912179B2 (en) 2016-04-27
AU2021258043B2 (en) 2022-11-03
US9622009B2 (en) 2017-04-11
TW202339510A (en) 2023-10-01
JP6759442B2 (en) 2020-09-23
JP2021005876A (en) 2021-01-14
SG10201604679UA (en) 2016-07-28
KR101845226B1 (en) 2018-05-18
IL284585A (en) 2021-08-31
KR101685447B1 (en) 2016-12-12
TW202139720A (en) 2021-10-16
JP2016165117A (en) 2016-09-08
AU2023200502A1 (en) 2023-03-02
US10327092B2 (en) 2019-06-18
RU2017112527A (en) 2019-01-24
TW201909658A (en) 2019-03-01
AU2016202227A1 (en) 2016-05-05
US10057708B2 (en) 2018-08-21
PL2727383T3 (en) 2021-08-02
AR086775A1 (en) 2014-01-22
JP6174184B2 (en) 2017-08-02
JP6486995B2 (en) 2019-03-20
WO2013006338A3 (en) 2013-10-10
DK2727383T3 (en) 2021-05-25
CA2837893A1 (en) 2013-01-10
BR122020001361B1 (en) 2022-04-19
CA2837893C (en) 2017-08-29
AU2019204012A1 (en) 2019-07-11
IL291043B2 (en) 2023-03-01
HK1219604A1 (en) 2017-04-07
JP6637208B2 (en) 2020-01-29
US20180324543A1 (en) 2018-11-08
IL277736B (en) 2021-07-29
ES2871224T3 (en) 2021-10-28
TWI722342B (en) 2021-03-21
IL295733B2 (en) 2023-10-01
IL295733B1 (en) 2023-06-01
US20160381483A1 (en) 2016-12-29
US20140133683A1 (en) 2014-05-15
TW201325269A (en) 2013-06-16
AU2021258043A1 (en) 2021-11-25
TWI543642B (en) 2016-07-21
US9179236B2 (en) 2015-11-03
JP2014522155A (en) 2014-08-28
IL265741A (en) 2019-06-30
KR102406776B1 (en) 2022-06-10
KR20200058593A (en) 2020-05-27
EP2727383A2 (en) 2014-05-07
CN103650539A (en) 2014-03-19
RU2013158054A (en) 2015-08-10
US20190104376A1 (en) 2019-04-04
US20180027352A1 (en) 2018-01-25
JP2022058569A (en) 2022-04-12
US9467791B2 (en) 2016-10-11
JP6882618B2 (en) 2021-06-02
KR102185941B1 (en) 2020-12-03
US11412342B2 (en) 2022-08-09
JP6821854B2 (en) 2021-01-27
US20210219091A1 (en) 2021-07-15
US20160021476A1 (en) 2016-01-21
IL230046A (en) 2016-06-30
AU2018203734B2 (en) 2019-03-14
US9800991B2 (en) 2017-10-24
HUE054452T2 (en) 2021-09-28
US20190306652A1 (en) 2019-10-03
JP2023164976A (en) 2023-11-14
US20230045090A1 (en) 2023-02-09
EP2727383B1 (en) 2021-04-28
TWI603632B (en) 2017-10-21
KR20140017682A (en) 2014-02-11
AU2020226984A1 (en) 2020-09-17
AU2012279357B2 (en) 2016-01-14
CN105792086A (en) 2016-07-20
MX2013014684A (en) 2014-03-27
US9942688B2 (en) 2018-04-10
JP2020057014A (en) 2020-04-09
JP7348320B2 (en) 2023-09-20
KR20190086785A (en) 2019-07-23
IL291043B (en) 2022-11-01
RU2741738C1 (en) 2021-01-28
KR20200137034A (en) 2020-12-08
RU2617553C2 (en) 2017-04-25
CA2973703A1 (en) 2013-01-10
IL295733A (en) 2022-10-01
RU2731025C2 (en) 2020-08-28
CN103650539B (en) 2016-03-16
CA2973703C (en) 2022-06-21
US10165387B2 (en) 2018-12-25
TW201642673A (en) 2016-12-01
JP2019095813A (en) 2019-06-20
JP2021131562A (en) 2021-09-09
RU2017112527A3 (en) 2020-06-26
US10477339B2 (en) 2019-11-12
IL291043A (en) 2022-05-01
UA124570C2 (en) 2021-10-13

Similar Documents

Publication Publication Date Title
AU2021258043B2 (en) System and method for adaptive audio signal generation, coding and rendering
AU2012279357A1 (en) System and method for adaptive audio signal generation, coding and rendering
US11962997B2 (en) System and method for adaptive audio signal generation, coding and rendering

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201280032058.3

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12743261

Country of ref document: EP

Kind code of ref document: A2

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
ENP Entry into the national phase

Ref document number: 2837893

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2012279357

Country of ref document: AU

Date of ref document: 20120627

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: MX/A/2013/014684

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2014518958

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20137034894

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 14130386

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: A201400839

Country of ref document: UA

WWE Wipo information: entry into national phase

Ref document number: 2012743261

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2013158054

Country of ref document: RU

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12743261

Country of ref document: EP

Kind code of ref document: A2

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112013033386

Country of ref document: BR

WWE Wipo information: entry into national phase

Ref document number: 245574

Country of ref document: IL

ENP Entry into the national phase

Ref document number: 112013033386

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20131224