EP3987824A1 - Audio rendering for low frequency effects - Google Patents
Audio rendering for low frequency effectsInfo
- Publication number
- EP3987824A1 EP3987824A1 EP20736832.5A EP20736832A EP3987824A1 EP 3987824 A1 EP3987824 A1 EP 3987824A1 EP 20736832 A EP20736832 A EP 20736832A EP 3987824 A1 EP3987824 A1 EP 3987824A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio data
- audio
- low frequency
- frequency effects
- soundfield
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000694 effects Effects 0.000 title claims abstract description 104
- 238000009877 rendering Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 claims abstract description 70
- 230000008569 process Effects 0.000 claims abstract description 21
- 239000000203 mixture Substances 0.000 claims description 7
- 238000002156 mixing Methods 0.000 claims description 5
- 230000001131 transforming effect Effects 0.000 claims 2
- 230000006870 function Effects 0.000 description 19
- 238000001514 detection method Methods 0.000 description 12
- 230000005540 biological transmission Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 5
- 238000003491 array Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 235000009508 confectionery Nutrition 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000007654 immersion Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000004091 panning Methods 0.000 description 3
- VBRBNWWNRIMAII-WYMLVPIESA-N 3-[(e)-5-(4-ethylphenoxy)-3-methylpent-3-enyl]-2,2-dimethyloxirane Chemical compound C1=CC(CC)=CC=C1OC\C=C(/C)CCC1C(C)(C)O1 VBRBNWWNRIMAII-WYMLVPIESA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
Definitions
- This disclosure relates to processing of media data, such as audio data.
- Audio rendering refers to a process of producing speaker feeds that configure one or more speakers (e.g., headphones, loudspeakers, other transducers including bone conducting speakers, etc.) to reproduce a soundfield represented by audio data.
- the audio data may conform to one or more formats, including scene-based audio formats (such as the format specified in the motion picture experts group - MPEG-H audio coding standard), object-based audio formats, and/or channel -based audio formats.
- An audio playback device may apply an audio Tenderer to the audio data in order to generate or otherwise obtain the speaker feeds.
- the audio playback device may process the audio data to obtain one or more speaker feeds dedicated for reproducing low frequency effects (LFE, which may also be referred to as bass below a threshold, such as 120 or 150 Hertz) that is potentially output to a LFE capable speaker, such as a subwoofer.
- LFE low frequency effects
- This disclosure relates generally to techniques directed to audio rendering for low frequency effects (LFE).
- LFE low frequency effects
- Various aspects of the techniques may enable spatialized rendering of LFE to potentially improve reproduction of low frequency components (e.g., below a threshold frequency of 200 Hertz - Hz, 150 Hz, 120 Hz, or 100 Hz) of the soundfield.
- various aspects of the techniques may analyze the audio data to identify spatial characteristics associated with the LFE components, and process, based on the spatial characteristics, the audio data (e.g., render) in various ways more to possibly more accurately spatialize the LFE components within the soundfield.
- various aspects of the techniques may improve operation of audio playback devices as potentially more accurate spatialization of the LFE components within the soundfield may improve immersion and thereby the overall listening experience. Further, various aspects of the techniques may address issues in which the audio playback device may be configured to reconstruct the LFE components of the soundfield when dedicated LFE channels are corrupted or otherwise incorrectly coded by the audio data, using LFE embedded in other middle (often, referred to as mid) or high frequency components of the audio data, as described in greater detail throughout this disclosure. Through potentially more accurate reconstruction (in terms of spatialization), various aspects of the techniques may improve LFE audio rendering from mid or high frequency components of the audio data.
- the techniques are directed to a device comprising: a memory configured to store audio data representative of a soundfield; and one or more processors configured to: analyze the audio data to identify spatial characteristics of low frequency effects components of the soundfield; process, based on the spatial characteristics, the audio data to render a low frequency effects speaker feed; and output the low frequency effects speaker feed to a low frequency effects capable speaker.
- the techniques are directed to a method comprising:
- the techniques are directed to a device comprising: means for analyzing audio data representative of a soundfield to identify spatial characteristics of low frequency effects components of the soundfield; means for processing, based on the spatial characteristics, the audio data to render a low frequency effects speaker feed; and means for outputting the low frequency effects speaker feed to a low frequency effects capable speaker.
- the techniques are directed to a non-transitory computer- readable storage medium having stored thereon instructions that, when executed, cause one or more processors of a device to: analyze audio data representative of a soundfield to identify spatial characteristics of low frequency effects components of the soundfield; process, based on the spatial characteristics, the audio data to render a low frequency effects speaker feed; and output the low frequency effects speaker feed to a low frequency effects capable speaker.
- FIG. l is a block diagram illustrating an example system that may perform various aspects of the techniques described in this disclosure.
- FIG. 2 is a block diagram illustrating, in more detail, the LFE Tenderer unit shown in the example of FIG. 1.
- FIG. 3 is a block diagram illustrating, in more detail, another example of the LFE Tenderer unit shown in FIG. 1.
- FIG. 4 is a flowchart illustrating example operation of the LFE Tenderer unit shown in FIGS. 1-3 in performing various aspects of low frequency effects rendering techniques.
- FIG. 5 is a block diagram illustrating example components of the content consumer device 14 shown in the example of FIG. 1.
- the Moving Pictures Expert Group has released a standard allowing for soundfields to be represented using a hierarchical set of elements (e.g., Higher-Order Ambisonic - HOA - coefficients) that can be rendered to speaker feeds for most speaker configurations, including 5.1 and 22.2 configuration whether in location defined by various standards or in non-uniform locations.
- elements e.g., Higher-Order Ambisonic - HOA - coefficients
- MPEG released the standard as MPEG-H 3D Audio standard, formally entitled “Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3 : 3D audio,” set forth by ISO/IEC JTC 1/SC 29, with document identifier ISO/IEC DIS 23008-3, and dated July 25, 2014.
- MPEG also released a second edition of the 3D Audio standard, entitled“Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3 : 3D audio, set forth by ISO/IEC JTC 1/SC 29, with document identifier ISO/IEC 23008-3 :201x(E), and dated October 12, 2016.
- Reference to the“3D Audio standard” in this disclosure may refer to one or both of the above standards.
- SHC spherical harmonic coefficients
- the expression shows that the pressure t at any point (r r , q n , f n ⁇ of the soundfield, at time /, can be represented uniquely by the SHC, ATM(k).
- k c is the speed of sound (-343 m/s)
- ⁇ r r , 0 r , cp r ⁇ is a point of reference (or observation point)
- j n ( ) is the spherical Bessel function of order //
- U (b n , cp r ) are the spherical harmonic basis functions (which may also be referred to as a spherical basis function) of order n and suborder m.
- the term in square brackets is a frequency-domain representation of the signal (i.e., 5(w, r r , 6 r , f t )) which can be approximated by various time-frequency transformations, such as the discrete Fourier transform (DFT), the discrete cosine transform (DCT), or a wavelet transform.
- DFT discrete Fourier transform
- DCT discrete cosine transform
- wavelet transform a wavelet transform.
- Other examples of hierarchical sets include sets of wavelet transform coefficients and other sets of coefficients of multiresolution basis functions.
- the SHC ATM(k) can either be physically acquired (e.g., recorded) by various microphone array configurations or, alternatively, they can be derived from channel- based or object-based descriptions of the soundfield.
- the SHC (which also may be referred to as higher order ambisonic - HOA - coefficients) represent scene-based audio, where the SHC may be input to an audio encoder to obtain encoded SHC that may promote more efficient transmission or storage. For example, a fourth-order representation involving (1+4) 2 (25, and hence fourth order) coefficients may be used.
- the SHC may be derived from a microphone recording using a microphone array.
- Various examples of how SHC may be derived from microphone arrays are described in Poletti, M.,“Three-Dimensional Surround Sound Systems Based on Spherical Harmonics,” J. Audio Eng. Soc., Vol. 53, No. 11, 2005 November, pp. 1004- 1025.
- a number of PCM objects can be represented by the ATM(k) coefficients (e.g., as a sum of the coefficient vectors for the individual objects).
- the coefficients contain information about the soundfield (the pressure as a function of 3D coordinates), and the above represents the transformation from individual objects to a representation of the overall soundfield, in the vicinity of the observation point ⁇ r r , q t , f t ⁇ .
- Scene-based audio formats such as the above noted SHC (which may also be referred to as higher order ambisonic coefficients, or“HOA coefficients”), represent one way by which to represent a soundfield.
- Other possible formats include channel-based audio formats and object-based audio formats.
- Channel -based audio formats refer to the 5.1 surround sound format, 7.1 surround sound formats, 22.2 surround sound formats, or any other channel-based format that localizes audio channels to particular locations around the listener in order to recreate a soundfield.
- Object-based audio formats may refer to formats in which audio objects, often encoded using pulse-code modulation (PCM) and referred to as PCM audio objects, are specified in order to represent the soundfield.
- Such audio objects may include metadata identifying a location of the audio object relative to a listener or other point of reference in the soundfield, such that the audio object may be rendered to one or more speaker channels for playback in an effort to recreate the soundfield.
- PCM pulse-code modulation
- the techniques described in this disclosure may apply to any of the foregoing formats, including scene-based audio formats, channel -based audio formats, object-based audio formats, or any combination thereof.
- FIG. l is a block diagram illustrating an example system that may perform various aspects of the techniques described in this disclosure.
- a system 10 includes a source device 12 and a content consumer device 14. While described in the context of the source device 12 and the content consumer device 14, the techniques may be implemented in any context in which audio data is used to reproduce a soundfield.
- the source device 12 may represent any form of computing device capable of generating the representation of a soundfield, and is generally described herein in the context of being a content creator device.
- the content consumer device 14 may represent any form of computing device capable of implementing the audio rendering techniques described in this disclosure as well as audio playback, and is generally described herein in the context of being an audio/visual (A/V) receiver.
- A/V audio/visual
- the source device 12 may be operated by an entertainment company or other entity that may generate multi-channel audio content for consumption by operators of content consumer devices, such as the content consumer device 14. In some scenarios, the source device 12 may generate audio content in conjunction with video content, although such scenarios are not depicted in the example of FIG. 1 for ease of illustration purposes.
- the source device 12 includes a content capture device 300, a content editing device 304, and a soundfield representation generator 302.
- the content capture device 300 may be configured to interface or otherwise communicate with a microphone 5.
- the microphone 5 may represent an Eigenmike® or other type of 3D audio microphone capable of capturing and representing the soundfield as audio data 11, which may refer to one or more of the above noted scene-based audio data (such as HOA coefficients), object-based audio data, and channel -based audio data. Although described as being 3D audio microphones, the microphone 5 may also represent other types of microphones (such as omni-directional microphones, spot microphones, unidirectional microphones, etc.) configured to capture the audio data 11.
- 3D audio microphones such as omni-directional microphones, spot microphones, unidirectional microphones, etc.
- the content capture device 300 may, in some examples, include an integrated microphone 5 that is integrated into the housing of the content capture device 300.
- the content capture device 300 may interface wirelessly or via a wired connection with the microphone 5.
- the content capture device 300 may process the audio data 11 after the audio data 11 is input via some type of removable storage, wirelessly and/or via wired input processes.
- various combinations of the content capture device 300 and the microphone 5 are possible in accordance with this disclosure.
- the content capture device 300 may also be configured to interface or otherwise communicate with the content editing device 304.
- the content capture device 300 may include the content editing device 304 (which in some instances may represent software or a combination of software and hardware, including the software executed by the content capture device 300 to configure the content capture device 300 to perform a specific form of content editing).
- the content editing device 304 may represent a unit configured to edit or otherwise alter content 301 received from content capture device 300, including the audio data 11.
- the content editing device 304 may output edited content 303 and/or associated metadata 305 to the soundfield representation generator 302.
- the soundfield representation generator 302 may include any type of hardware device capable of interfacing with the content editing device 304 (or the content capture device 300). Although not shown in the example of FIG. 1, the soundfield representation generator 302 may use the edited content 303, including the audio data 11 and/or metadata 305, provided by the content editing device 304 to generate one or more bitstreams 21. In the example of FIG. 1, which focuses on the audio data 11, the soundfield representation generator 302 may generate one or more representations of the same soundfield represented by the audio data 11 to obtain a bitstream 21 that includes the representations of the soundfield and/or audio metadata 305.
- soundfield representation generator 302 may use a coding scheme for ambisonic representations of a soundfield, referred to as Mixed Order Ambisonics (MO A) as discussed in more detail in U.S. Application Serial No. 15/672,058, entitled“MIXED-ORDER AMBISONICS (MO A) AUDIO DATA FO COMPUTER-MEDIATED REALITY SYSTEMS,” and filed August 8, 2017, published as U.S. patent publication no. 20190007781 on January 3, 2019.
- MO A Mixed Order Ambisonics
- the soundfield representation generator 302 may generate a partial subset of the full set of HOA coefficients. For instance, each MO A representation generated by the soundfield representation generator 302 may provide precision with respect to some areas of the soundfield, but less precision in other areas.
- an MOA representation of the soundfield may include eight (8) uncompressed HOA coefficients of the HOA coefficients, while the third order HOA representation of the same soundfield may include sixteen (16) uncompressed HOA coefficients of the HOA coefficients.
- each MOA representation of the soundfield that is generated as a partial subset of the HOA coefficients may be less storage-intensive and less bandwidth intensive (if and when transmitted as part of the bitstream 21 over the illustrated transmission channel) than the corresponding third order HOA representation of the same soundfield generated from the HOA coefficients.
- the techniques of this disclosure may also be performed with respect to full-order ambisonic (FOA) representations in which all of the HOA coefficients for a given order N are used to represent the soundfield.
- the soundfield representation generator 302 may represent the soundfield using all of the HOA coefficients for a given order N, resulting in a total of HOA coefficients equaling (N+l) 2 .
- the higher order ambisonic audio data may include higher order ambisonic coefficients associated with spherical basis functions having an order of one or less (which may be referred to as“1 st order ambisonic audio data”), higher order ambisonic coefficients associated with spherical basis functions having a mixed order and suborder (which may be referred to as the “MOA representation” discussed above), or higher order ambisonic coefficients associated with spherical basis functions having an order greater than one (which is referred to above as the“FOA representation”).
- higher order ambisonic coefficients associated with spherical basis functions having an order of one or less which may be referred to as“1 st order ambisonic audio data”
- higher order ambisonic coefficients associated with spherical basis functions having a mixed order and suborder which may be referred to as the “MOA representation” discussed above
- higher order ambisonic coefficients associated with spherical basis functions having an order greater than one which is referred
- the content capture device 300 or the content editing device 304 may, in some examples, be configured to wirelessly communicate with the soundfield representation generator 302. In some examples, the content capture device 300 or the content editing device 304 may communicate, via one or both of a wireless connection or a wired connection, with the soundfield representation generator 302. Via the connection between the content capture device 300 and the soundfield representation generator 302, the content capture device 300 may provide content in various forms, which, for purposes of discussion, are described herein as being portions of the audio data 11.
- the content capture device 300 may leverage various aspects of the soundfield representation generator 302 (in terms of hardware or software capabilities of the soundfield representation generator 302).
- the soundfield representation generator 302 may include dedicated hardware configured to (or specialized software that when executed causes one or more processors to) perform psychoacoustic audio encoding (such as a unified speech and audio coder denoted as “US AC” set forth by the Motion Picture Experts Group (MPEG) or the MPEG-H 3D audio coding standard).
- the content capture device 300 may not include the psychoacoustic audio encoder dedicated hardware or specialized software and instead may provide audio aspects of the content 301 in a non-psychoacoustic-audio-coded form.
- the soundfield representation generator 302 may assist in the capture of content 301 by, at least in part, performing psychoacoustic audio encoding with respect to the audio aspects of the content 301.
- the soundfield representation generator 302 may also assist in content capture and transmission by generating one or more bitstreams 21 based, at least in part, on the audio content (e.g., MOA representations and/or third order HOA representations) generated from the audio data 11 (in the case where the audio data 11 includes scene- based audio data).
- the bitstream 21 may represent a compressed version of the audio data 11 and any other different types of the content 301 (such as a compressed version of spherical video data, image data, or text data).
- the soundfield representation generator 302 may generate the bitstream 21 for transmission, as one example, across a transmission channel, which may be a wired or wireless channel, a data storage device, or the like.
- the bitstream 21 may represent an encoded version of the audio data 11, and may include a primary bitstream and another side bitstream, which may be referred to as side channel information.
- the bitstream 21 representing the compressed version of the audio data 11 (which again may represent scene-based audio data, object-based audio data, channel -based audio data, or combinations thereof) may conform to bitstreams produced in accordance with the MPEG-H 3D audio coding standard.
- the content consumer device 14 may be operated by an individual, and may represent an A/V receiver client device. Although described with respect to an A/V receiver client device (which may also be referred to as an“A/V receiver,” an“AV receiver” or an“AV receiver client device”), content consumer device 14 may represent other types of devices, such as a virtual reality (VR) client device, an augmented reality (AR) client device, a mixed reality (MR) client device, a laptop computer, a desktop computer, a workstation, a cellular phone or handset (including as so-called “smartphone”), a television, a dedicated gaming system, a handheld gaming system, a smart speaker, a vehicle head unit (such as an infotainment or entertainment system for an automobile or other vehicle), or any other device capable of performing audio rendering with respect to audio data 15. As shown in the example of FIG. 1, the content consumer device 14 includes an audio playback system 16, which may refer to any form of audio playback system capable of rendering the audio data 15 for playback as multi channel audio content.
- the source device 12 may output the bitstream 21 to an intermediate device positioned between the source device 12 and the content consumer device 14.
- the intermediate device may store the bitstream 21 for later delivery to the content consumer device 14, which may request the bitstream.
- the intermediate device may comprise a file server, a web server, a desktop computer, a laptop computer, a tablet computer, a mobile phone, a smart phone, or any other device capable of storing the bitstream 21 for later retrieval by an audio decoder.
- the intermediate device may reside in a content delivery network capable of streaming the bitstream 21 (and possibly in conjunction with transmitting a corresponding video data bitstream) to subscribers, such as the content consumer device 14, requesting the bitstream 21.
- the source device 12 may store the bitstream 21 to a storage medium, such as a compact disc, a digital video disc, a high definition video disc or other storage media, most of which are capable of being read by a computer and therefore may be referred to as computer-readable storage media or non-transitory computer-readable storage media.
- the transmission channel may refer to the channels by which content (e.g., in the form of one or more bitstreams 21) stored to the mediums are transmitted (and may include retail stores and other store-based delivery mechanism).
- the techniques of this disclosure should not therefore be limited in this respect to the example of FIG. 1.
- the content consumer device 14 includes the audio playback system 16.
- the audio playback system 16 may represent any system capable of playing back multi-channel audio data.
- the audio playback system 16 may include a number of different Tenderers 22.
- the Tenderers 22 may each provide for a different form of rendering, where the different forms of rendering may include one or more of the various ways of performing vector-base amplitude panning (VBAP), and/or one or more of the various ways of performing soundfield synthesis.
- VBAP vector-base amplitude panning
- “A and/or B” means “A or B”, or both“A and B”.
- the audio playback system 16 may further include an audio decoding device 24.
- the audio decoding device 24 may represent a device configured to decode bitstream 21 to output audio data 15.
- the audio data 15 may include scene-based audio data that in some examples, may form the full second or higher order HOA representation or a subset thereof that forms an MOA representation of the same soundfield, decompositions thereof, such as the predominant audio signal, ambient HOA coefficients, and the vector based signal described in the MPEG-H 3D Audio Coding Standard, or other forms of scene-based audio data.
- the audio data 15 may be similar to a full set or a partial subset of the audio data 11, but may differ due to lossy operations (e.g., quantization) and/or transmission via the transmission channel.
- the audio data 15 may include, as an alternative to, or in conjunction with the scene-based audio data, channel -based audio data.
- the audio data 15 may include, as an alternative to, or in conjunction with the scene-based audio data, object-based audio data.
- the audio data 15 may include any combination of scene-based audio data, object-based audio data, and channel -based audio data.
- the audio Tenderers 22 of audio playback system 16 may, after audio decoding device 24 has decoded the bitstream 21 to obtain the audio data 15, render the audio data 15 to output speaker feeds 25.
- the speaker feeds 25 may drive one or more speakers (which are not shown in the example of FIG. 1 for ease of illustration purposes).
- Various audio representations, including scene-based audio data (and possibly channel-based audio data and/or object-based audio data) of a soundfield may be normalized in a number of ways, including N3D, SN3D, FuMa, N2D, or SN2D.
- the audio playback system 16 may obtain speaker information 13 indicative of a number of speakers (e.g., loudspeakers or headphone speakers) and/or a spatial geometry of the speakers. In some instances, the audio playback system 16 may obtain the speaker information 13 using a reference microphone and driving the speakers in such a manner as to dynamically determine the speaker information 13. In other instances, or in conjunction with the dynamic determination of the speaker information 13, the audio playback system 16 may prompt a user to interface with the audio playback system 16 and input the speaker information 13.
- speaker information 13 indicative of a number of speakers (e.g., loudspeakers or headphone speakers) and/or a spatial geometry of the speakers.
- the audio playback system 16 may obtain the speaker information 13 using a reference microphone and driving the speakers in such a manner as to dynamically determine the speaker information 13.
- the audio playback system 16 may prompt a user to interface with the audio playback system 16 and input the speaker information 13.
- the audio playback system 16 may select one of the audio Tenderers 22 based on the speaker information 13. In some instances, the audio playback system 16 may, when none of the audio Tenderers 22 are within some threshold similarity measure (in terms of the speaker geometry) to the speaker geometry specified in the speaker information 13, generate the one of audio Tenderers 22 based on the speaker information 13. The audio playback system 16 may, in some instances, generate one of the audio Tenderers 22 based on the speaker information 13 without first attempting to select an existing one of the audio Tenderers 22.
- some threshold similarity measure in terms of the speaker geometry
- the audio playback system 16 may utilize one of the Tenderers 22 that provides for binaural rendering using head- related transfer functions (HRTF) or other functions capable of rendering to left and right speaker feeds 25 for headphone speaker playback, such as binaural room impulse response (BRIR) Tenderers.
- HRTF head-related transfer functions
- BRIR binaural room impulse response
- the terms“speakers” or“transducer” may generally refer to any speaker, including loudspeakers, headphone speakers, bone-conducting speakers, earbud speakers, wireless headphone speakers, etc.
- One or more speakers may then playback the rendered speaker feeds 25.
- rendering of the speaker feeds 25 may refer to other types of rendering, such as rendering incorporated directly into the decoding of the audio data 15 from the bitstream 21.
- An example of the alternative rendering can be found in Annex G of the MPEG-H 3D audio coding standard, where rendering occurs during the predominant signal formulation and the background signal formation prior to composition of the soundfield.
- reference to rendering of the audio data 15 should be understood to refer to both rendering of the actual audio data 15 or decompositions or representations thereof of the audio data 15 (such as the above noted predominant audio signal, the ambient HOA coefficients, and/or the vector-based signal - which may also be referred to as a V-vector).
- the audio data 11 may represent a soundfield including what is referred to as low frequency effects (LFE) components, which may also be referred to as bass below a certain threshold frequency (such as 200 Hertz - Hz, 150 Hz, 120 Hz, or 100 Hz).
- LFE low frequency effects
- Audio data conforming to some audio formats, such as the channel -based audio formats, may include a dedicated LFE channel (which is usually denoted as dot one - “X. l” - meaning a single dedicated LFE channel with X main channels, such as center, front left, front right, back left and back right when X is equal to five,“X.2” referring to two dedicated LFE channels, etc.).
- Audio data conforming to object-based audio formats may define one or more audio objects and the location of each of the audio objects in the soundfield, which are then transformed into channels that are mapped to the individual speakers, including any subwoofers should sufficient LFE components be present (e.g., below approximately 200 Hz) in the soundfield.
- the audio playback system 16 may process each audio object, performing a distance measure to identify a distance from which the LFE components originate, a low pass filter to extract any LFE components below a threshold (e.g., 200 Hz), a bass activity detection to identify the LFE components, etc.
- the audio playback system 16 may then render one or more LFE speaker feeds before processing the LFE speaker feeds to perform dynamic range control, the output of which results in adjusted LFE speaker feeds.
- Audio data conforming to the scene-based audio formats may define the soundfield as one or more higher order ambisonic (HOA) coefficients, which are associated with spherical basis functions having an order and suborder greater than or equal to zero.
- the audio playback system 16 may render the HOA coefficients speaker feeds located equidistant about a sphere (so-called Fliege-Maier points) around a sweet spot (which is another way of referring to an intended listening location) at the center of the sphere.
- the audio playback system 16 may process each of the rendered speaker feeds in a similar manner to that described above with respect to the audio data conforming to the object-based formats, resulting in adjusted LFE speaker feeds.
- the audio playback system 16 may equally process each of the channels (either provided in the case of channel-based audio data or rendered in the case of scene-based audio data) and/or audio objects to obtain the adjusted LFE speaker feeds.
- Each of the channels and/or audio objects are processed equally because a human auditory system is generally considered to be insensitive to a directionality and shape of LFE components of the soundfield, as the LFE components are generally felt (as vibrations) rather than distinctly heard compared to higher frequency components of the soundfield, which can be distinctly localized by the human auditory system.
- LFE-capable speakers which may refer to full frequency speakers, such as large center speakers, large front right speakers, large front left speakers, etc., in addition to one or more subwoofers - where two or more subwoofers is increasingly becoming common, especially in cinemas and other dedicated viewing and/or listening areas, such as in-home cinema or listening rooms
- the lack of spatialization of LFE components may be sensed by the human auditory system.
- viewers and/or listeners may notice a degradation in immersion when the LFE components are not correctly spatialized when reproduced, where such degradation may be detected when an associated scene being viewed does not correctly match with the reproduction of the LFE components.
- the degradation may further be increased when the LFE channel is corrupted (for channel-based audio data) or when the LFE channel is not provided (as may be the case for object-based audio data and/or scene-based audio data).
- Reconstruction of the LFE channel may involve mixing all of the higher frequency channels together (after rendering the audio objects and/or HOA coefficients to the channels when applicable) and outputting the mixed channels to the LFE-capable speaker, which may not be full band (in terms of frequency) and thereby produce an inaccurate reproduction of the LFE components given that the high frequency components of the mixed channels may muddy or otherwise render the reproduction inaccurate.
- additional processing may be performed to reproduce the LFE speaker feeds, but such processing neglects the spatialization aspect and outputs the same LFE speaker feed to each of the LFE-capable speakers, which again may be sensed by the human auditory system as being inaccurate.
- the audio playback system 16 may perform spatialized rendering of LFE components to potentially improve reproduction of the LFE components (e.g., below a threshold frequency of 200 Hertz - Hz, 150 Hz, 120 Hz, or 100 Hz) of the soundfield. Rather than process all aspects of the audio data equally to obtain the LFE speaker feeds, the audio playback system 16 may analyze the audio data 15 to identify spatial characteristics associated with the LFE components, and process, based on the spatial characteristics, the audio data (e.g., render) in various ways more to possibly more accurately spatialize the LFE components within the soundfield.
- the audio playback system 16 may analyze the audio data 15 to identify spatial characteristics associated with the LFE components, and process, based on the spatial characteristics, the audio data (e.g., render) in various ways more to possibly more accurately spatialize the LFE components within the soundfield.
- the audio playback system 16 may include an LFE Tenderer unit 26, which may represent a unit configured to spatialize the LFE components of the audio data 15 in accordance with various aspects of the techniques described in this disclosure.
- the LFE Tenderer unit 26 may analyze the audio data 15 to identify spatial characteristics of the LFE components of the soundfield.
- the LFE Tenderer unit 26 may generate, based on the audio data 15, a spherical heat map (which may also be referred to as an “energy map”) reflecting acoustical energy levels within the soundfield for one or more frequency ranges (e.g., from zero Hz to 200 Hz, 150 Hz, or 120 Hz).
- the LFE Tenderer unit 26 may then identify, based on the spherical heatmap, the spatial characteristics of the LFE components of the soundfield. For example, the LFE Tenderer unit 26 may identify a direction and shape of the LFE components based on where there is higher energy LFE components in the soundfield relative to other locations within the soundfield. The LFE Tenderer unit 26 may next process, based on the identified direction, shape and/or other spatial characteristics, the audio data 15 to render an LFE speaker feed 27.
- the LFE Tenderer unit 26 may then output the LFE speaker feed 27 to an LFE- capable speaker (which is not shown in the example of FIG. 1 for ease of illustration purposes).
- the audio playback device 16 may mix the LFE speaker feeds 27 with one or more of the speaker feeds 25 to obtain mixed speaker feeds, which are then output to one or more LFE capable speakers.
- various aspects of the techniques may improve operation of the audio playback device 16 as potentially allowing for more accurate spatialization of the LFE components within the soundfield may improve immersion and thereby the overall listening experience. Further, various aspects of the techniques may address issues in which the audio playback device 16 may be configured to reconstruct the LFE components of the soundfield when dedicated LFE channels are corrupted or otherwise incorrectly coded by the audio data, using LFE embedded in other middle (often, referred to as mid) or high frequency components of the audio data 15. Through potentially more accurate reconstruction (in terms of spatialization), various aspects of the techniques may improve LFE audio rendering from mid or high frequency components of the audio data 15.
- FIG. 2 is a block diagram illustrating, in more detail, the LFE Tenderer unit shown in the example of FIG. 1.
- the LFE Tenderer unit 26 A represents one example of the LFE Tenderer unit 26 shown in the example of FIG. 1, where the LFE Tenderer unit 26A includes a spatialized LFE analyzer 110, a distance measure unit 112, a low-pass filter 114, a bass activity detection unit 116, a rendering unit 118, and a dynamic range control (DRC) unit 120.
- the spatialized LFE analyzer 110 may represent a unit configured to identify the spatial characteristics (“SC”) 111 of the LFE components of the soundfield represented by the audio data 15.
- SC spatial characteristics
- the spatialized LFE analyzer 110 may obtain the audio data 15 and analyze the audio data 15 to identify the SC 111.
- the spatialized LFE analyzer 110 may analyze the full frequency audio data 15 to produce the spherical heatmap, representative of the directional acoustic energy (which may also be referred to as level or gain) surrounding the sweet spot.
- the spatialized LFE analyzer 110 may then identify, based on the spherical heatmap, the SC 111 of the LFE components of the soundfield.
- the SC 111 of the LFE component may include one or more directions (e.g., a direction of arrival), one or more associated shapes, and the like.
- the spatialized LFE analyzer 110 may generate the spherical heatmap in a number of different ways depending on the format of the audio data 15.
- the spatialized LFE analyzer 110 may directly produce the spherical heatmap from the channels, where each channel is defined as residing to a distinct location in space (e.g., as part of the 5.1 audio format).
- the LFE analyzer 110 may forgo generation of the spherical heatmap , as the object metadata may directly define a location at which the associated object resides.
- the LFE analyzer 110 may process all of the objects to identify which of the objects contribute to the LFE components of the soundfield, and identify the SC 111 based on the object metadata associated with the identified objects.
- the spatialized LFE analyzer 110 may transform the object audio data 15 from the spatial domain to the spherical harmonic domain, producing HOA coefficients representative of each of the objects.
- the spatialized LFE analyzer 110 may next mix all of the HOA coefficients from each of the objects together, and transform the HOA coefficients from the spherical harmonic domain back to the spatial domain, producing channels (or, in other words, render the HOA coefficients into channels).
- the rendered channels may be equally spaced about a sphere surrounding the listener.
- the rendered channels may form the basis for the spherical heatmap.
- the spatialized LFE analyzer 110 may perform a similar operation to that described above in the instance of scene-based audio data (referring to the rendering of the channels from the HOA coefficients that are then used to generate the spherical heatmap, which again may also be referred to as an energy map). [0066]
- the spatialized LFE analyzer 110 may output the SC 111 to one or more of the distance measure unit 112, the low-pass filter 114, the bass activity detection unit 116, the rendering unit 118, and/or the dynamic range control unit 120.
- the distance measure unit 112 may determine a distance between where the LFE component is originating (as indicated by the SC 111 or derived therefrom) and each LFE-capable speaker.
- the distance measure unit 112 may then select the one of the LFE-capable speakers having the smallest determined distance. When there is only a single LFE-capable speaker, the LFE rendering unit 26 A may not invoke the distance measure unit 112 to compute or otherwise determine the distance.
- the low-pass filter 114 may represent a unit configured to perform low-pass filtering with respect to the audio data 15 to obtain LFE components of the audio data 15. To conserve processing cycles and thereby promote more efficient operation (with the associated benefits of lower power consumption, bandwidth - including memory bandwidth - utilization, etc.), the low-pass filter 114 may select only those channels (for channel -based audio data) from the direction identified by the SC 111. However, in some examples, the low-pass filter 114 may apply a low-pass filter to the entirety of the audio data 15 to obtain the LFE components. The low-pass filter 114 may output the LFE components to the bass activity detection unit 116.
- the bass activity detection unit 116 may represent a unit configured to detect whether, for a given frame of the LFE component, includes bass or not.
- the bass activity detection unit 116 may apply a noise floor threshold (e.g., 20 decibels - dB) to each frame of the LFE component.
- a noise floor threshold e.g. 20 decibels - dB
- the bass activity detection unit 116 may use a histogram (over time) to set a dynamic noise floor threshold.
- the bass activity detection unit 116 may indicate that the LFE component is active for the current frame and is to be rendered.
- the bass activity detection unit 116 may indicate that the LFE component is not active for the current frame and is not to be rendered.
- the bass activity detection unit 116 may output this indication to rendering unit 118.
- the rendering unit 118 may render, based on the SC 111 and the speaker information 13, the LFE-capable speaker feeds 27. That is, for channel-based audio data, the rendering unit 118 may weight the channels according to the SC 111 to potentially emphasize a direction from which the LFE component is originating in the soundfield. As such, the rendering unit 118 may apply, based on the SC 1 11, a first weight to a first audio channel of a number of audio channels that is different than a second weight applied to a second audio channel of the number of audio channels to obtain a first weighted audio channel.
- the rendering unit 118 may next mix the first weighted audio channel with a second weighted audio channel obtained by applying the second weight to the second audio channel to obtain a mixed audio channel. The rendering unit 118 may then obtain, based on the mixed audio channel, the one or more LFE-capable speaker feeds 27.
- the rendering unit 118 may adjust an object rendering matrix to account for the direction of arrival of the LFE component, using the SC 11 1 as the direction of arrival.
- the rendering unit 118 may adjust a similar HOA rendering matrix to account for the direction of arrival of the LFE component, again using the SC 111 as the direction of arrival.
- the rendering unit 118 may utilize the speaker information 13 to determine various aspects of the rendering weights/matrix (as well as any delays, crossover, etc.) to account for differences between the specified locations of the speakers (such as by the 5.1 format) to the actual locations of the LFE capable speakers.
- the rendering unit 118 may perform various types of rendering, such as object- based rendering types including vector based amplitude panning (VBAP), distance-based amplitude panning (DBAP), and/or ambisonic-based rendering types.
- object- based rendering types including vector based amplitude panning (VBAP), distance-based amplitude panning (DBAP), and/or ambisonic-based rendering types.
- VBAP vector based amplitude panning
- DBAP distance-based amplitude panning
- ambisonic-based rendering types ambisonic-based rendering types.
- the rendering unit 118 may perform VBAP, DBAP, and/or the ambisonic-based rendering types so as to create an audible appearance of a virtual speaker located at the direction of arrival defined by the SC 1 11.
- the rendering unit 118 may be configured to process, based on the SC 111, the audio data to render a first low frequency effects speaker feed and a second low frequency effects speaker feed, the first low frequency effects speaker feed being different than the second low frequency effects speaker feed. Rather than render different low frequency effects speaker feeds, the rendering unit 118 may perform VBAP to localize the direction of arrival of the low frequency effects components.
- the rendering unit 118 may refrain from rendering the current frame. In any event, the rendering unit 118 may output, when the LFE component is indicated as being active, the LFE capable speaker feeds 27 to the dynamic range control (DRC) unit 120.
- DRC dynamic range control
- the dynamic range control unit 120 may ensure that the dynamic range of the LFE-capable speaker feeds 27 remains within a maximum gain to avoid damaging the LFE-capable speaker feeds 27. As the tolerances may differ on a per speaker basis, the dynamic range control unit 120 may ensure that the LFE-capable speakers feeds 27 remain below a maximum gain defined for each of the LFE-capable speakers (or identified automatically by the dynamic range control unit 120 or other components within the audio playback system 16). The dynamic range control unit 120 may output the adjusted LFE-capable speaker feeds 27 to the LFE-capable speakers.
- FIG. 3 is a block diagram illustrating, in more detail, another example of the LFE Tenderer unit shown in FIG. 1.
- the LFE Tenderer unit 26B represents one example of the LFE Tenderer unit 26 shown in the example of FIG. 1, where the LFE Tenderer unit 26B includes the same spatialized LFE analyzer 110, the distance measure unit 112, the low-pass filter 114, the bass activity detection unit 116, the rendering unit 118, and the dynamic range control (DRC) unit 120 as discussed above with respect to the LFE Tenderer unit 26 A.
- DRC dynamic range control
- the LFE Tenderer unit 26B differs from the LFE Tenderer unit 26A, as the bass activity detection unit 116 is first to process the audio data 15, thereby potentially improving processing efficiency given that frames having no bass activity are skipped thereby avoiding processing by the spatialized LFE analyzer 110, the distance measure unit 112, and the low-pass filter 114.
- FIG. 4 is a flowchart illustrating example operation of the LFE Tenderer unit shown in FIGS. 1-3 in performing various aspects of low frequency effects rendering techniques.
- the LFE Tenderer unit 26 may analyze the audio data 15 representative of a soundfield to identify the SC 111 of low frequency effects components of the soundfield (200). To perform the analysis, the LFE Tenderer unit 26 may generate, based on the audio data 15, a spherical heatmap representative of energy surrounding a listener located at a middle of a sphere (in the sweet spot). The LFE Tenderer unit 26 may select a direction at which the most energy is localized, as described above in more detail.
- the LFE Tenderer unit 26 may next process, based on the SC 111, the audio data to render one or more low frequency effects speaker feeds (202). As discussed above with respect to the example of FIG. 2, the LFE Tenderer unit 26 may adapt rendering unit 118 to differently weight each channel (for channel-based audio data), object (for object- based audio data), and/or various HOA coefficients (for scene-based audio data) based on the SC 111.
- the LFE Tenderer unit 26 may configure the rendering unit 118 to weight a right channel higher than a left channel (or to entirely discard the left channel as it may have little to no LFE components).
- the LFE Tenderer unit 26 may configure the rendering unit 118 to weight an object responsible for the majority of the energy (and whose metadata indicates that the object resides on the right) over an object to the left of the listener (or to discard the object to the left of the listener).
- the LFE Tenderer unit 26 may configure the rendering unit 118 to weight right channels rendered from the HOA coefficients over left channels rendered from the HOA coefficients.
- the LFE Tenderer unit 26 may output the low frequency effects speaker feed 27 to a low frequency effects capable speaker (204).
- a low frequency effects capable speaker e.g., scene-based audio data
- the techniques may be performed with respect to mixed format audio data in which there is two or more of channel-based audio data, object-based audio data, or scene-based audio data for the same frame of time.
- FIG. 5 is a block diagram illustrating example components of the content consumer device 14 shown in the example of FIG. 1.
- the content consumer device 14 includes a processor 412, a graphics processing unit (GPU) 414, system memory 416, a display processor 418, one or more integrated speakers 105, a display 103, a user interface 420, and a transceiver module 422.
- the display processor 418 is a mobile display processor (MDP).
- MDP mobile display processor
- the processor 412, the GPU 414, and the display processor 418 may be formed as an integrated circuit (IC).
- the IC may be considered as a processing chip within a chip package and may be a system-on-chip (SoC).
- SoC system-on-chip
- two of the processors 412, the GPU 414, and the display processor 418 may be housed together in the same IC and the other in a different integrated circuit (i.e., different chip packages) or all three may be housed in different ICs or on the same IC.
- the processor 412, the GPU 414, and the display processor 418 are all housed in different integrated circuits in examples where the content consumer device 14 is a mobile device.
- Examples of the processor 412, the GPU 414, and the display processor 418 include, but are not limited to, fixed function and/or programmable processing circuitry such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
- the processor 412 may be the central processing unit (CPU) of the content consumer device 14.
- the GPU 414 may be specialized hardware that includes integrated and/or discrete logic circuitry that provides the GPU 414 with massive parallel processing capabilities suitable for graphics processing.
- GPU 414 may also include general purpose processing capabilities, and may be referred to as a general- purpose GPU (GPGPU) when implementing general purpose processing tasks (i.e., non graphics related tasks).
- the display processor 418 may also be specialized integrated circuit hardware that is designed to retrieve image content from the system memory 416, compose the image content into an image frame, and output the image frame to the display 103.
- the processor 412 may execute various types of the applications 20. Examples of the applications 20 include web browsers, e-mail applications, spreadsheets, video games, other applications that generate viewable objects for display, or any of the application types listed in more detail above.
- the system memory 416 may store instructions for execution of the applications 20. The execution of one of the applications 20 on the processor 412 causes the processor 412 to produce graphics data for image content that is to be displayed and the audio data 21 that is to be played (possibly via integrated speaker 105).
- the processor 412 may transmit graphics data of the image content to the GPU 414 for further processing based on and instructions or commands that the processor 412 transmits to the GPU 414.
- the processor 412 may communicate with the GPU 414 in accordance with a particular application processing interface (API).
- APIs include the DirectX ® API by Microsoft ® , the OpenGL ® or OpenGL ES ® by the Khronos group, and the OpenCLTM; however, aspects of this disclosure are not limited to the DirectX, the OpenGL, or the OpenCL APIs, and may be extended to other types of APIs.
- the techniques described in this disclosure are not required to function in accordance with an API, and the processor 412 and the GPU 414 may utilize any technique for communication.
- the system memory 416 may be the memory for the content consumer device 14.
- the system memory 416 may comprise one or more computer-readable storage media. Examples of the system memory 416 include, but are not limited to, a random- access memory (RAM), an electrically erasable programmable read-only memory (EEPROM), flash memory, or other medium that can be used to carry or store desired program code in the form of instructions and/or data structures and that can be accessed by a computer or a processor.
- RAM random- access memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or other medium that can be used to carry or store desired program code in the form of instructions and/or data structures and that can be accessed by a computer or a processor.
- system memory 416 may include instructions that cause the processor 412, the GPU 414, and/or the display processor 418 to perform the functions ascribed in this disclosure to the processor 412, the GPU 414, and/or the display processor 418. Accordingly, the system memory 416 may be a computer- readable storage medium having instructions stored thereon that, when executed, cause one or more processors (e.g., the processor 412, the GPU 414, and/or the display processor 418) to perform various functions.
- processors e.g., the processor 412, the GPU 414, and/or the display processor 4128
- the system memory 416 may include a non-transitory storage medium.
- the term“non-transitory” indicates that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term“non-transitory” should not be interpreted to mean that the system memory 416 is non-movable or that its contents are static. As one example, the system memory 416 may be removed from the content consumer device 14 and moved to another device. As another example, memory, substantially similar to the system memory 416, may be inserted into the content consumer device 14.
- a non-transitory storage medium may store data that can, over time, change (e.g., in RAM).
- the user interface 420 may represent one or more hardware or virtual (meaning a combination of hardware and software) user interfaces by which a user may interface with the content consumer device 14.
- the user interface 420 may include physical buttons, switches, toggles, lights or virtual versions thereof.
- the user interface 420 may also include physical or virtual keyboards, touch interfaces - such as a touchscreen, haptic feedback, and the like.
- the processor 412 may include one or more hardware units (including so-called “processing cores”) configured to perform all or some portion of the operations discussed above with respect to the LFE Tenderer unit 26 of FIG. 1.
- the transceiver module 422 may represent one or more receivers and one or more transmitters capable of wireless communication in accordance with one or more wireless communication protocols.
- the A/V device may communicate, using a network interface coupled to a memory of the AV/streaming device, exchange messages to an external device, where the exchange messages are associated with the multiple available representations of the soundfield.
- the A/V device may receive, using an antenna coupled to the network interface, wireless signals including data packets, audio packets, video packets, or transport protocol data associated with the multiple available representations of the soundfield.
- one or more microphone arrays may capture the soundfield.
- the multiple available representations of the soundfield stored to the memory device may include a plurality of object-based representations of the soundfield, higher order ambisonic representations of the soundfield, mixed order ambisonic representations of the soundfield, a combination of object-based representations of the soundfield with higher order ambisonic representations of the soundfield, a combination of object-based representations of the soundfield with mixed order ambisonic representations of the soundfield, or a combination of mixed order representations of the soundfield with higher order ambisonic representations of the soundfield.
- one or more of the soundfield representations of the multiple available representations of the soundfield may include at least one high-resolution region and at least one lower-resolution region, and wherein the selected presentation based on the steering angle provides a greater spatial precision with respect to the at least one high- resolution region and a lesser spatial precision with respect to the lower-resolution region.
- Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol.
- computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave.
- Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
- a computer program product may include a computer-readable medium.
- such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer- readable medium.
- coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
- DSL digital subscriber line
- computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media.
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- the term“processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
- the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
- the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set).
- IC integrated circuit
- a set of ICs e.g., a chip set.
- Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Abstract
Description
Claims
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GR20190100269 | 2019-06-20 | ||
US16/714,468 US11122386B2 (en) | 2019-06-20 | 2019-12-13 | Audio rendering for low frequency effects |
PCT/US2020/037926 WO2020257193A1 (en) | 2019-06-20 | 2020-06-16 | Audio rendering for low frequency effects |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3987824A1 true EP3987824A1 (en) | 2022-04-27 |
Family
ID=71465428
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20736832.5A Pending EP3987824A1 (en) | 2019-06-20 | 2020-06-16 | Audio rendering for low frequency effects |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3987824A1 (en) |
CN (1) | CN114128312A (en) |
WO (1) | WO2020257193A1 (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150264483A1 (en) * | 2014-03-14 | 2015-09-17 | Qualcomm Incorporated | Low frequency rendering of higher-order ambisonic audio data |
WO2015147434A1 (en) * | 2014-03-25 | 2015-10-01 | 인텔렉추얼디스커버리 주식회사 | Apparatus and method for processing audio signal |
US10405126B2 (en) | 2017-06-30 | 2019-09-03 | Qualcomm Incorporated | Mixed-order ambisonics (MOA) audio data for computer-mediated reality systems |
-
2020
- 2020-06-16 WO PCT/US2020/037926 patent/WO2020257193A1/en active Application Filing
- 2020-06-16 CN CN202080051077.5A patent/CN114128312A/en active Pending
- 2020-06-16 EP EP20736832.5A patent/EP3987824A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN114128312A (en) | 2022-03-01 |
WO2020257193A1 (en) | 2020-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10952009B2 (en) | Audio parallax for virtual reality, augmented reality, and mixed reality | |
EP2954703B1 (en) | Determining renderers for spherical harmonic coefficients | |
EP2954521B1 (en) | Signaling audio rendering information in a bitstream | |
US20200013426A1 (en) | Synchronizing enhanced audio transports with backward compatible audio transports | |
JP2016523467A (en) | Binauralization of rotated higher-order ambisonics | |
US11122386B2 (en) | Audio rendering for low frequency effects | |
WO2015138856A1 (en) | Low frequency rendering of higher-order ambisonic audio data | |
US20200120438A1 (en) | Recursively defined audio metadata | |
US11081116B2 (en) | Embedding enhanced audio transports in backward compatible audio bitstreams | |
US11062713B2 (en) | Spatially formatted enhanced audio data for backward compatible audio bitstreams | |
US9466302B2 (en) | Coding of spherical harmonic coefficients | |
EP3987824A1 (en) | Audio rendering for low frequency effects | |
US11967329B2 (en) | Signaling for rendering tools | |
US20240129681A1 (en) | Scaling audio sources in extended reality systems | |
US20210264927A1 (en) | Signaling for rendering tools | |
WO2024081530A1 (en) | Scaling audio sources in extended reality systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20211203 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20240219 |