WO2015017914A1 - Media production and distribution system for custom spatialized audio - Google Patents

Media production and distribution system for custom spatialized audio Download PDF

Info

Publication number
WO2015017914A1
WO2015017914A1 PCT/CA2014/000603 CA2014000603W WO2015017914A1 WO 2015017914 A1 WO2015017914 A1 WO 2015017914A1 CA 2014000603 W CA2014000603 W CA 2014000603W WO 2015017914 A1 WO2015017914 A1 WO 2015017914A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
media
hrir
hrtf
consumer
Prior art date
Application number
PCT/CA2014/000603
Other languages
French (fr)
Inventor
Matt HOLLAND
Sean Cunningham
Carissa OUELLETTE
Original Assignee
Audilent Technologies Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Audilent Technologies Inc. filed Critical Audilent Technologies Inc.
Publication of WO2015017914A1 publication Critical patent/WO2015017914A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/40Visual indication of stereophonic sound image

Definitions

  • the present technology relates to media production and distribution apparatus.
  • binaural synthesis in which computers and software are used to replicate a dual-channel playback system that captures the physical properties of how sound waves interact with the torso, head, and pinna of the human body.
  • the first method involves a sequence of record-replay techniques and multichannel mixing technologies called ambisonics.
  • ambisonic techniques will utilize a multi-channel speaker array (sometimes up to 24 loud speakers) to recreate realistic 3D acoustic environments that are governed by sound physics.
  • the theory behind this methodology is a well studied topic, and can achieve impressive spatial acuity as the number of speakers in a play-back array grows.
  • Unfortunately, the commercial viability and adoption potential of such systems begins to decrease as these speaker arrays become more and more complex.
  • the second method of producing spatial audio is more akin to the manner in which free-field sound environments are experienced in the real world: with two channels.
  • One form of this is stereo recording.
  • “Stereo" recordings are essentially any recording made with two channels of audio, where the signal on each channel is different.
  • two microphones spaced some reasonable distance apart are used to create a recording. As slightly different sound waves hit each microphone, slightly different sounds are recorded in each channel. When one plays the recording back, they hear a sense of space between the speakers (or headphones) which creates the stereo image.
  • binaural recordings are two channel recordings created by placing two omnidirectional microphones inside, or as close to the ears as is practical.
  • the head and ear structure affect the way sound waves are picked up by the microphones so that the location information contained in the frequency, amplitude and phase responses of the left and right channels closely match the cues required by the human auditory system to localize sound sources. Positioned in this way, the microphones accurately capture sonic information coming from all directions and will produce extremely realistic recordings when listened to through headphones.
  • Binaural perception of a spatial environment is a direct result of having two ears and physical anatomy that reflects, refracts, and attenuates the physical properties of sound waves.
  • the human brain is an adept processor of these miniscule differences in attenuation, frequency, and arrival times at the left and right ear. The brain receives and processes these physical variations in sound pressure and translates them into localization information.
  • Research dating back as early as the 1990s and the 2000s began exploring the complex interactions between human anatomy and sound waves as they enter the ear canal. In some studies, two small, expensive, and highly sensitive microphones would be placed at the entrance of each ear canal of a subject. The subject would then be exposed to a myriad of controlled sound sources.
  • HRTF Head Related Transfer Function
  • an HRIR/HRTF set is highly dependent on the location of the sound source. This is due to the change in anatomical filtering effects resulting from variations in sound wave impact angles, and thus reflections of those sound waves prior to entering the ear canal.
  • an HRTF for the right ear and an HRTF for the left ear represents how the sound is anatomically filtered prior to entering each ear canal.
  • Binaural recording methods are quite similar to the manner in which HRIR/HRTF data are collected, except they do not aim to synthesize audio through digital signal processing techniques. Binaural recording methods simply place highly sensitive microphones at the left and right ear channels of an anatomically-correct dummy and record a live sound environment.
  • the dummy is usually comprised of the upper-torso, the head, and an averaged sized pinna for the left and right ear.
  • Headphone playback of the recordings generally achieve impressive extemalization and localization for most people. This is partly due to the fact that the recording method does not rely on any digital signal processing techniques to replicate the many other acoustical phenomena present in a sound environment such as room reverberation, timbre, and background ambient noise. These factors greatly contribute to the realism of the sound environment.
  • the synthetic front-plane HRTF was compared to an acoustically measured front-plane HRTF with errors as low as 2.38% (ranging up to 18.9%). This suggests that there is a growing body of knowledge that is capable of parameterizing ear dimensions to synthesize personalized HRTF data.
  • Customized spatial audio could be useful in a vast number of virtual reality environments purposed towards entertainment or simulated training environments. Companies like Sony, Nokia, and Creative Laboratories are currently exploring applications of spatial audio technology in their products such as headphones, mobile device audio output, and personal computer audio output. The United States Federal military is exploring the use of spatial sound for military simulation training applications. Embedding custom HRTF/HRIR filter data on digital signal processing chips could enable fully customizable hearing aid development. Custom spatial audio could be used to accurately guide the blind in audio-based GPS applications, reducing their reliance on guide dogs or caregivers.
  • a binaural system is needed for the integration of user-specific HRIR/HRTF data collections with a broad range of spatialized media and is capable of filtering any number of sound sources (and their associated locations) with any number of unique HRIR/HRTF collections. It would be advantageous if the system included a simple, (preferably hardware-free) method for users to determine their customized HRIR/HRTF data set. There would be a further advantage if a user was provided with the means for their custom HRIR/HRTF data to be applied to various audio media of interest to them, while at the same time, the system was scalable to a large number of rendering requests varying in complexity.
  • the present technology provides a binaural system for the integration of user- specific HRIR/HRTF data collections with a broad range of spatialized media and is capable of filtering any number of sound sources and their associated locations with any number of unique HRIR/HRTF collections.
  • the technology utilizes servers containing extensive storage capacity, and arrays of parallelized processing units (GPUs/CPUs). With these components the server will have the ability to communicate through the internet and perform complex tasks upon request.
  • the server stores collections of HRIR/HRTF data, as well as audio data and their associated spatialization directives.
  • the server's parallelized processing unit arrays are employed to render media data featuring customized spatial audio.
  • a system for providing custom spatialized audio includes a software module that is utilized by a media producer and includes a media producer HRIR or HRTF data source that may be accessed through an internet connection and stored permanently or temporarily on a device running the software module.
  • a software module is provided that is utilized by a media consumer and includes a consumer HRIR or HRTF data source that may be accessed through an internet connection and stored permanently or temporarily on the device running the software module.
  • An audio apparatus enables the consumer to review and edit the spatialization of the producer rendered audio input.
  • An audio apparatus provides a custom rendered audio output for a media consumer.
  • the producer rendered audio input including positional data and raw audio that may be accessed through an internet connection and stored permanently or temporarily on the device running the software module.
  • the spatialization and rendering using consumer HRIR or HRTF data is accessible for streaming via an internet-connection or downloading to a local media playback device.
  • a method for producing and distributing custom spatialized media to a user includes providing at least one audio track into a spatialization system.
  • the system accesses HRIR or HRTF data on the producer, spatializing the audio track.
  • the system renders the audio track to provide a producer rendered audio track comprising positional data and raw audio.
  • the system utilizes the internet for uploading and storing the rendered audio track permanently or temporarily on a server for future access by a user.
  • the method further includes providing internet access to facilitate the user requesting the producer rendered audio track stored on the internet-accessible server and facilitate the consumer reviewing and editing the spatialization of the producer rendered audio track.
  • the method finally includes storing permanently or temporarily a statistically approximated HRIR or HRTF as HRIR or HRTF data with respect to each user.
  • the system accesses HRIR or HRTF data on the user stored on the internet-accessible server, spatializing the producer rendered audio track, and rendering the producer rendered audio track to produce a user rendered audio track which may be accessible for streaming via an internet-connection or downloading to a local media playback device. 26]
  • the system and method, as described above, enables users to obtain custom spatialization of audio over the internet. This is a dramatic development in that it takes a technology that has, to this point, primarily been utilized by academics in research environments with extremely limited public access and makes it available to any member of the public having a device capable of connecting to the internet.
  • the HRIR or HRTF database will be dynamic and collect a user profile with HRIR or HRTF data from each new user. Having once used the service, with every subsequent use of the service the user will receive an audio file spatialized to correspond with the previously recorded HRIR or HRTF data.
  • the software module that is utilized will include a positioner for providing positional data to the audio apparatus. This will enable the consumer to render and review the spatial directives to provide "customization" of the spatialization of the audio output as desired. It is also envisaged the user will have the option of saving the user rendered audio track on an internet-accessible server for the purpose of future access. It will be appreciated that the use of the method and system may be extended to audio tracks associated with video. BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram overview of the system in accordance with the present technology.
  • a binaural audio production and distribution system generally referred to as 10 is shown in Figure 1.
  • the system 10 includes three modules: the audio production module 100, the media consumer module 300, and the audio services module 200.
  • the audio production module 100 is any electronic device with the capacity to utilize standard audio production software, store data locally, and send data via an internet connection, for example but not limited to personal computers or modern mobile devices.
  • the audio production module 100 serves as an apparatus for retrieving, from the media producer 109, the raw spatialization data 105,205 required to create media files that feature customized audio spatialization.
  • the audio production module 100 can also function as a platform for the rendering of media files featuring customized audio spatialization.
  • the media consumer module 300 is any device with Internet access and the capability to store or stream media files, for example, but not limited to personal computers or modern mobile devices.
  • the media consumer module 300 is the platform from which a request for media featuring customized audio spatialization originates.
  • the media consumer module 300 can also function as a platform for the rendering of media files featuring customized audio spatialization.
  • the media consumer module 300 has the ability to generate raw spatialization data using input from the media consumer, external/embedded hardware, or other software programs running on the system.
  • the audio services module 200 is an apparatus that contains a number of functional parts required for the rendering and delivery of media featuring spatialized audio.
  • the audio services module 200 may include, but is not limited to, storage for:
  • Database 203 containing numerous HRIR/HRTF collections that represent media consumer's or media producer's hearing profiles
  • the audio services module 200 optionally includes an Audio Rendering Apparatus 204 for the rendering of media featuring spatialized audio.
  • the audio services module 200 services requests for media featuring spatialized audio by retrieving and transmitting:
  • the audio services module 200 is implemented as a standalone hub or is integrated into an existing media distribution system.
  • the Spatialization Apparatus 101 is a software program and/or a software program combined with external hardware that uses input from the media producer 109 to add binaural spatialization to audio tracks.
  • the Spatialization Apparatus 101 is implemented as standalone software running on the audio production module 100 or is software executed from within an existing audio creation/editing environment.
  • a media producer 109 uses the Spatialization Apparatus 101 in conjunction with un-filtered audio data 107 as the first step in the process of creating spatialized binaural audio customized to the listener.
  • the apparatus has a graphical user interface and/or an external hardware controller to represent and control the location of any sound source(s) in a virtual 3D environment.
  • the Spatialization Apparatus 101 enables the media producer 109 to set the azimuth, elevation and distance of a dry audio track in relation to the virtual location of the listener.
  • the virtual position of the audio track is dynamic, and the Spatialization Apparatus 101 tracks this virtual positioning over the duration of the audio track.
  • the Spatialization Apparatus 101 allows the media producer 109 to hear playback of the spatialization effects using their own personalized hearing profile (HRIR/HRTF Collection) 102.
  • HRIR/HRTF Collection representing the media producer's 109 hearing profile is present on the audio production module and accessible by the Spatialization Apparatus 101.
  • the apparatus 101 uses this HRIR/HRTF collection 102 to filter any audio tracks to which the media producer 109 has applied spatialization effects so that the output may be monitored in real-time during playback. In essence, the media producer 109 is listening to the spatialization effect as customized for himself.
  • the software is used on any number of audio tracks that are then mixed together during the audio rendering process with the audio rendering apparatus 104,204, 304 to form a complete binaural audio file.
  • the purpose of the software is to produce the raw spatialization data for each audio track 105, 205 that is used as an input to the audio rendering stage.
  • the generated positional data 108, 208, 308 for any particular audio track may consist of any combination of positional data, room modeling data, or headphone equalization data for each of the audio tracks comprising a complete binaural audio file. Additionally, in the consumer media module 300, the generated positional data 308 may also contain data regarding the current location of the user in reference to the media consumer module. This information, when available, may be used to modify the positional data of the each audio track.
  • the media consumer module 300 may generate spatialization data 308 by user inputs and/or external hardware inputs.
  • the locally generated position data 308 can be combined with unfiltered audio tracks to form local raw spatialization data that can be input to the Audio Rendering Apparatus 304 on the media consumer module 300.
  • the raw spatialization data 105, 205, 305 for any particular audio track consists of the unfiltered audio data 107, 207, 307 and generated positional data 108, 208, 308 correlating its location in virtual three dimensional space over the duration of the track.
  • the generated positional data 208 indicates the location of the sound source on either a sample by sample basis or over specified sample/time intervals.
  • the raw spatialization data 105, 205, 305 for each audio track is used by the Audio Rendering Apparatus 204 to select the appropriate filter coefficients within an HRIR/HRTF collection 102 that are then convolved with the original audio data 110 to produce time- varying 3D spatialization.
  • the audio services module 200 contains raw spatialization data for every media file available for download/streaming.
  • the Audio Rendering Apparatus 104, 204, 304 is a software program and/or a connected external hardware system that takes the raw spatialization data 105, 205, convolves it with coefficients from an HRIR/HRTF collection 102 and outputs a binaural audio file 106, 206, 209, 306 in any standard streaming/playback format.
  • the standard inputs to the Audio Rendering Apparatus 104, 204, 304 are the raw spatialization data 105, 205, 305 output by the Spatialization Apparatus 101 and a single (or multiple) HRIR/HRTF collection(s) 103, 302 or the database 203. Additionally, the Audio Rendering Apparatus 104, 204, 304 can use locally generated location data 305 combined with traditional unfiltered audio 307 as inputs. Additionally, the Audio Rendering Apparatus 304 may utilize input data provided by external media 501 that consists of traditional unfiltered audio 307, and may additionally contain pre-determined positional data to provide the Audio Rendering Apparatus 304 with complete raw spatialization data 305.
  • the output of the apparatus is a binaural 3D media file 106, 206, 209, 306 encoded in standard streaming/playback formats.
  • Each track that is to be spatialized within a multi-track audio stream/file is processed using filter coefficients retrieved from the same HRIR/HRTF collection 102 and selected based on the location information stored in the raw spatialization data 105.
  • the individual spatialized audio tracks are then mixed together into a complete binaural 3D media file 106, 206, 209, 306 encoded in standard streaming/playback formats. In this manner, an audio stream/file is produced that features binaural spatialization unique to that particular HRIR HRTF collection 102. If a database of HRIR/HRTF collections 103 is available, the Audio Rendering Apparatus 204 can render an output stream/file for each HRIR/HRTF collection 103 in the database.
  • the HRIR/HRTF Calibration Apparatus 401 determines the hearing profile of media consumers 301 and media producers 109 using the audio production module 100.
  • the output of the Calibration Apparatus 401 is a user profile 202 leading to a database 203 modeling the media consumer's hearing profile that is uploaded to and stored on the audio services module 200, or downloaded to and stored on the media consumer module 300.
  • the Calibration Apparatus 401 is a software program executed on the media consumer module 300, a web application served from an online resource, or a hardware solution for directly measuring the listener's hearing profile.
  • the Calibration Apparatus 401 can either generate a custom HRIR/HRTF collection 302 corresponding to the listener's hearing profile (HRIR/HRTF collection generation), or match the listener to an existing HRIR HRTF collection in the database 203 that closely models their hearing profile (HRIR HRTF collection matching).
  • the Calibration Apparatus 401 utilizes various user inputs to determine an appropriate HRIR/HRTF collection that models the user's unique perception of sound. These inputs include but are not limited to:
  • Anthropometric measurements of the user's head and ears obtained through manual measurement
  • Anthropometric measurements of the user's head and ears obtained through digital images
  • User feedback obtained through an interactive process for example but not limited to user performance while playing a game dependant on spatial audio cues
  • the calibration process either matches the user to an HRIR/HRTF collection from the database 203 that closely models their hearing profile or generates a unique HRIR/HRTF collection 302 specific to that user.
  • the result of the calibration process is an HRIR/HRTF collection 302 to be used by the audio production module 100 for providing spatialized audio specific to the calibrated user.
  • a reference to the HRIR/HRTF collection 302 is saved to the user's profile 202 for later use in on-demand media rendering or retrieval of pre-rendered media. Alternatively, it could be saved directly to the media consumer module 300.
  • the database 203 of HRIR/HRTF Collections is a number of HRIR/HRTF collections 102, 302 each corresponding to a unique hearing profile stored on the audio services module 200 and the audio production module 100.
  • a portion of the database can be static, and may contain a fixed number of HRIR HRTF collections 102, 302.
  • the Calibration Apparatus 401 uses this static portion to determine a media consumer's hearing profile by matching them to their best fit HRIR/HRTF collection from the database 203.
  • the database 203 stored on the audio production module 100 consists of the portion of the HRIR/HRTF database which is static.
  • the database can also consist of custom HRIR/HRTF collections output by the Calibration Apparatus 401 or uploaded by the media consumer 301.
  • This portion of the database contains at least one HRIR/HRTF collection for each media consumer 301 that has had a fully customized hearing profile created after undergoing calibration or has uploaded a previously obtained HRIR/HRTF collection.
  • the size of this portion of the database is dynamic and increases as each new customized hearing profile is stored.
  • a User Profile 202 is a collection of user data that may contain a generated listening profile represented by a customized HRIR/HRTF collection which would constitute an entry in the dynamic portion of the database of HRIR/HRTF collections, and/or a link to a matched HRIR HRTF collection stored within the static portion of the database on the audio services module 200.
  • the user data is uploaded to the User Profile 202 by the described Calibration Apparatus or obtained through other third party means and uploaded by the media consumer 301.
  • a User Profile 202 is stored on the audio services module 200 for each media consumer 301 that wishes to request media featuring spatialized audio. Fully-Rendered Media
  • the fully-rendered media 106, 206 consists of media featuring spatialized audio that is stored on the audio services module 200.
  • the fully rendered media are either output from the Audio Rendering Apparatus 204 present on the audio services module 200, or output from the Audio Rendering Apparatus 104 and then transmitted to the audio services module 200.
  • Each media file that is available to be downloaded/streamed from the audio services module 200 is rendered once for every HRIR/HRTF collection in a static database and stored as fully-rendered media.
  • the Audio Rendering Apparatus 304 found on the media consumer module 300 may output fully rendered media 306 specifically for the media consumer 301. This particular rendered media will be stored locally on the media consumer's system.
  • the Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 are implemented as part of the audio services module 200.
  • the two apparatuses are implemented as a single functioning unit but are shown separately in Figure 1 for increased clarity. They receive and serve the request from the media consumer 301 for media featuring spatialized audio.
  • the initial input to the Media Retrieval Apparatus 201 and Raw Spatialization Data Retrieval Apparatus 210 is a request originating from the media consumer module 300 for media featuring spatialized audio.
  • the Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 analyzes the request to determine what action(s) are to be taken by the audio services module 200.
  • the Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 queries the User Profiles 202 stored on the audio services module 200 to retrieve the HRIR/HRTF collection(s) representing each media consumer's 301 hearing profile. It can access and transmit the media that was previously rendered and stored on the audio services module 200. Additionally, it can access and transmit the raw spatialization data 205 stored on the audio services module 200, and it can initiate the on-demand rendering process performed by the Audio Rendering Apparatus 304 using either previously stored positional data 208 or user-generated positional data 308. The Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 would then transmit the rendered media.
  • the Media Retrieval Apparatus 201 queries the requestor's User Profile 202 to determine the HRIR/HRTF collection matched to the media consumer 301. The Media Retrieval Apparatus 201 then transmits to the media consumer 301 the version of the requested media that was rendered with that particular HRIR/HRTF Collection.
  • the Media Retrieval Apparatus 201 retrieves the user's HRIR/HRTF collection associated with the requesting User Profile 202. This HRIR/HRTF collection along with the raw spatialization data 205, 305 for the requested media are input to the Audio Rendering Apparatus 204, 304 and the media is rendered. The Media Retrieval Apparatus then transmits to the media consumer the media rendered with their specific HRIR/HRTF collection.
  • the Media Retrieval Apparatus 201 retrieves and transmits to the media consumer 301 the raw spatialization data 205 for the requested media.
  • the Media Retrieval Apparatus 201 retrieves and transmits to the media consumer 301 the unfiltered audio data 207 for the requested media.
  • the HRIR/HRTF collection 102, 302 illustrated in Figure 1 on both the audio production module 100 and media consumer module 300 represents a single HRIR/HRTF collection representing the media producer's 109 and media consumer's 301 hearing profiles respectively.
  • the HRIR/HRTF collection 102 on the audio production module 100 is used by the Spatialization Apparatus 101 to allow the media producer 109 to monitor the spatialized audio they are creating with their own hearing profile.
  • the HRIR/HRTF collection 302 on the media consumer module 300 is used as an input to the Audio Rendering Apparatus 204 for any media rendering performed on the media consumer module 300.
  • the media producer 109 is any entity using an audio production module 100 to create media featuring spatialized audio.
  • Media with Unfiltered Audio Data 107, 207, 307 is defined as any media containing audio that has not undergone spatialization and will be used as an input to either the Spatialization Apparatus 101 or the Audio Rendering Apparatus 104.
  • the media consumer 301 is any entity using a media consumer module 300 to request and/or listen to media featuring spatialized audio.
  • a media request 303 is any request originating from a media consumer 301 to the audio services module 200 for media featuring spatialized audio.
  • a suitable exemplary operation of the audio rendering apparatus comprises the following:
  • the media rendering is performed by the Audio Rendering Apparatus 104 functioning on the Audio Production Module 100. Multiple media files are rendered and then uploaded to the audio services module 200. The number of audio files generated is dependent on the total number of provided HRIR/HRTF collections in the HRIR HRTF database stored on the audio production module 100 (i.e. a unique media file is produced for each HRIR/HRTF collection within the database). The rendered media is then be packaged, transmitted and stored within the audio services module 200 from which they can be streamed/downloaded.
  • the media rendering is performed by the Audio Rendering Apparatus functioning on the audio services module 200.
  • the raw spatialization data 105 output from the Spatialization Apparatus 101 is transmitted from the audio production module 100 and stored on the audio services module 200.
  • the media rendering is then performed within the audio services module 200 using server resources to fully render multiple media files.
  • the number of media files generated is dependent on the total number of collections in the static portion of the database of HRIR/HRTF collections 203 (i.e. a unique media file is produced for each entry within the database).
  • the entire collection of rendered media files is then be stored within the audio services module 200 from which they can be streamed/downloaded.
  • the media rendering is performed by the Audio Rendering Apparatus functioning on the audio services module 200.
  • the raw spatialization data 205 for the requested media has been previously transmitted from the Media Production System and is stored within the audio services module 200.
  • only one media file is rendered which corresponds to a specific HRIR/HRTF collection associated with the hearing profile of the media consumer issuing the request.
  • the Media Retrieval Apparatus retrieves the appropriate raw spatialization data as well as media consumer's HRIR/HRTF collection and passes these inputs to the Audio Rendering Apparatus. The media are then rendered and transmitted to the requestor.
  • the media rendering may be performed on-demand by the Audio Rendering Apparatus functioning on the media consumer module 300 using local resources.
  • the rendering is performed using raw spatialization data that will have been transmitted from the audio services module 200 via the Spatialization Data Retrieval Apparatus 210.
  • the media rendering is performed using a HRIR/HRTF collection corresponding to the media consumer's hearing profile that is either stored locally on the media consumer module 300 or downloaded from the audio services module 200 at the time of request. After the media has been rendered it is available for playback or storage on the media consumer module 300.
  • the media rendering is performed on-demand by the Audio Rendering Apparatus functioning on the media consumer module 300 using local resources.
  • the rendering is performed using unfiltered audio tracks and positional data that is generated locally either on the media consumer module 300 or hardware attached to the media consumer module 300.
  • the media rendering is performed using a HRIR/HRTF collection corresponding to the media consumer's hearing profile that is stored locally on the media consumer module 300. After the media has been rendered it is available for playback or storage on the media consumer module 300.
  • a suitable exemplary operation of the calibration apparatus comprises the following:
  • Anthropometric measurements of the user's head and ears are obtained manually by the user and input to the Calibration Apparatus.
  • the anthropometric data is then used by the Calibration Apparatus to determine an HRIR/HRTF collection from a database of collections that best matches the user.
  • This matched HRIR/HRTF collection would have corresponding anthropometric measurements with the highest possible correlation to the input data and is sourced from an existing database stored on the audio services module 200.
  • Digital images of the user's head and ears are input into the Calibration Apparatus which then obtains the required anthropometric measurements from the images.
  • a unique HRIR HRTF collection specific to the user will then be generated using the obtained measurements and/or the user will be matched to an HRIR/HRTF collection as described above. If a customized HRIR/HRTF collection is generated, it will then be uploaded to the dynamic portion of the HRIR/HRTF database on the audio services module 200 and linked to the User Profile. Similarly, if matching to an existing HRIR/HRTF collection was performed, the existing collection (stored in the static portion of the database) will also be linked to the User Profile.
  • the Calibration Apparatus obtains user feedback through an interactive process and uses the feedback to subjectively determine the best performing HRIR/HRTF collection from a database of collections (performance matching). This matched HRIR/HRTF collection has corresponding anthropometric measurements with the highest possible correlation to the input data and is sourced from an existing database stored on the audio services module 200.
  • the Calibration Apparatus obtains user feedback through an interactive process and uses this feedback to generate a customized HRIR/HRTF collection by modifying an existing HRIR/HRTF collection that more closely represents the user's hearing profile.
  • the result of this method is an HRIR/HRTF collection unique to the user.
  • the customized HRIR/HRTF collection is generated, it is uploaded to the dynamic portion of the HRIR/HRTF database on the audio services module 200 and linked to the User Profile.
  • the user's HRIR/HRTF collection is obtained through direct measurement. Through front-end software, this customized HRIR/HRTF collection is then uploaded by the user to the dynamic portion of the HRIR/HRTF database on the audio services module 200 and linked to the User Profile.
  • a suitable exemplary operation of the media request apparatus comprises the following: [0088]
  • the fulfillment of a media request from the user is accomplished by the audio services module 200 in multiple ways, depending on the nature of the media request.
  • the media request can be serviced by providing rendered media directly to the user.
  • This rendered media will have been previously processed by the Audio Rendering Apparatus 104, 204 and is stored within the audio services module 200 in the rendered media form.
  • the rendered media are selected for the user by cross-referencing the HRIR/HRTF collection linked to the User Profile during the calibration process to the corresponding media file rendered with that specific HRIR/HRTF collection.
  • the media request can be serviced by "on-demand" rendering within the audio services module 200.
  • the media are rendered by the Audio Rendering Apparatus using the corresponding raw spatialization data and the HRIR/HRTF collection linked to the User Profile, both of which are stored on the audio services module 200.
  • the media file Once the media file has been rendered, it is transmitted to the media requestor.
  • the media request is serviced by "on-demand" rendering performed on the media consumer module 300.
  • the audio services module 200 upon receiving a media request the audio services module 200 transmits the raw spatialization data for the requested media to the Audio Rendering Apparatus located on the media consumer module 300.
  • An HRIR/HRTF collection corresponding to the media requestors hearing profile will also be transmitted from the audio services module 200.
  • the media are then rendered by the Audio Rendering Apparatus on the media consumer module 300, and are available for playback/streaming to the media consumer.
  • the media request is serviced by "on-demand" rendering performed on the media consumer module 300.
  • the audio services module 200 upon receiving a media request the audio services module 200 transmits the raw spatialization data for the requested media to the Audio Rendering Apparatus located on the media consumer module 300.
  • the Audio Rendering Apparatus located on the media consumer module 300.
  • one or more HRIR/HRTF collections corresponding to the user are gathered from internal storage on the media consumer module 300.
  • the media are then rendered by the Audio Rendering Apparatus on the media consumer module 300, and are available for playback/streaming to the media consumer.
  • a media producer wants to create a song featuring spatialized audio that may be downloaded by media consumers. The consumer will be able to download a version of the song that is specific to their hearing profile. In this way, the media producer is ensured that the consumer will hear the song as intended.
  • the song is uploaded from an audio production module 100 (that includes the audio production module) to the audio services module in the form of Raw Spatialization Data.
  • the media producer is creating the song within a Digital Audio Workstation (DAW) that supports 3rd party audio software.
  • DAW Digital Audio Workstation
  • the media producer undergoes the hearing profile calibration process before using the Spatialization Apparatus.
  • the media producer uses their mobile phone to capture images of their head and ears.
  • the calibration procedure is run as an application on their mobile device, and performs matching to link an HRIR/HRTF collection representing their hearing profile to a User Profile on the audio services module.
  • the media producer downloads their hearing profile from the audio services module, and attaches it to the Spatialization Apparatus. This allows the producer to monitor the spatialization effects added to each audio track using their specific HRIR/HRTF collection.
  • the media producer uses the Spatialization Apparatus to apply binaural spatial effects to the desired audio tracks.
  • the digital audio workstation allows the producer to monitor the final mix of the multi-track song when all of the individually spatialized tracks are played together.
  • the media producer When satisfied with the song, the media producer will export the raw spatialization data 101 for the song from their system to the audio services module 200.
  • the raw spatialization data is comprised of the generated positional data 108 and unfiltered audio data 107. This exporting/uploading step could be performed within the Spatialization Apparatus or using a standalone application that has access to the raw spatialization data.
  • the song is available to be downloaded/streamed from the audio services module 200. Access to the media is made available to users through on-demand rendering provided by the Audio Rendering Apparatus 204, which utilizes HRIR/HRTF data associated with the user profile 202 generating the request.
  • a media producer wants to create a video featuring spatialized audio that may viewed by media consumers.
  • the consumer will be able to download or stream a version of the video that features spatialized audio specific to their hearing profile. In this way, the media producer is ensured that the consumer will hear the audio in the video playback as is intended.
  • the video is uploaded from an audio production module 100 to the audio services module 200 in the form of fully rendered video files.
  • the media producer is editing the video's accompanying audio within a Digital Audio Workstation (DAW) that supports 3rd party audio software.
  • DAW Digital Audio Workstation
  • the media producer undergoes the hearing profile calibration process before using the Spatialization Apparatus.
  • the producer uses his/her personal computer to play an interactive game that incorporates user feedback to determine the HRIR/HRTF collection that best matches their hearing profile.
  • This HRIR/HRTF collection representing their hearing profile is linked to a user profile 202 on the audio services module 200.
  • the media producer downloads a file containing their hearing profile from the audio services module 200, and attaches it to the Spatialization Apparatus. This allows the producer to monitor the spatialization effects added to each audio track using their specific HRTR/HRTF collection.
  • the media producer uses the Spatialization Apparatus to apply binaural spatial effects to the desired audio tracks.
  • the DAW allows the producer to monitor the final mix of the multi-track audio file when the individually spatialized tracks are played together.
  • the Audio Rendering Apparatus on the audio production module 100 will be used to render a number of unique copies of the media's audio track.
  • the rendering program will have access to the Raw Spatialization Data output by the Spatialization Apparatus and a database of static HRIR/HRTF collections stored on the Media Production System. A version of the audio track will be rendered for each entry in the database of HRIR/HRTF collections.
  • the fully-rendered media files will be sent to the audio services module where it will then be available for download/streaming by a media consumer.
  • a media producer is creating the soundtrack for a PC video game.
  • the accompanying soundtrack will be spatialized using the HRIR/HRTF collection corresponding to a media consumer's profile.
  • the PC game production company has also opted to include accompanying spatialized sound effects during gameplay (which will also be filtered using the HRIR/HRTF collection corresponding to the media consumer's profile). These spatialized sound effects will have their position vectors and audio generated during gameplay on the media consumer module 300. In this way the game producer is ensured that the consumer will hear all audio in the game, including the correct positional cues from sound effects, as is intended.
  • the game may be sold via any retail methods, and must be linked during installation to the consumer's hearing profile.
  • the media producer is editing the game's accompanying music within a Digital Audio Workstation (DAW) that supports 3rd party audio software.
  • DAW Digital Audio Workstation
  • the media producer undergoes the hearing profile calibration process before using the spatialization software.
  • the producer measures and submits to the Calibration Apparatus all necessary anthropomorphic measurements required to determine the HRIR/HRTF collection that best matches their hearing profile.
  • This HRIR/HRTF collection representing their hearing profile is linked to a User Profile on the audio services module 200.
  • the media producer downloads their hearing profile from the audio services module 200, and attaches it to the spatialization software. This allows the producer to monitor the spatialization effects added to each audio track using their specific HRIR/HRTF collection.
  • the media producer uses the spatialization software to apply binaural spatial effects to the desired audio stems.
  • the DAW allows the producer to monitor the final mix of the multi-track audio file when the individually spatialized tracks are played together.
  • the Audio Rendering Apparatus 104 running on the audio production module 100 will be used to render a number of unique copies of the media's audio track.
  • the rendering program will have access to the raw spatialization data output by the Spatialization Apparatus and a database of HRIR/HRTF collections stored on the Media Production System. A version of the audio track will be rendered for each entry in the database of HRIR/HRTF collections.
  • the standalone program will export/upload the collection of fully rendered audio files to the audio services module 200 where it will then be available for download by a media consumer during the PC game installation.
  • a media consumer wishes to download a customized spatial audio track.
  • the audio services module 200 is integrated within a publicly accessible distribution website.
  • the media requested will be rendered on-demand by the Audio Rendering Apparatus using the raw spatialization data and the consumer's specified HRJR HRTF collection, all of which are stored on the audio services module 200.
  • the media consumer may then playback the customized audio track with any consumer electronic device which supports standard audio output formats.
  • the audio track may be optimized for spatial effects while listening with either conventional speaker set-ups or any standard pair of headphones/earbuds.
  • the consumer navigates to the media distribution website to create a personal account and to calibrate their hearing profile. Using their webcam, photos are taken of the consumer's head at a requested number of different angles. These photos are then submitted to the online Calibration Apparatus 401, which derives the necessary anthropometric measurements from the images to generate a custom HRIR HRTF collection representing the consumer's hearing profile.
  • the Media Retrieval Apparatus retrieves and inputs to the Audio Rendering Apparatus the consumer's custom HRIR/HRTF collection and the pre-filtered spatialization data for the song.
  • the custom audio track is rendered in the requested audio file format and transmitted to the media consumer by the Media Retrieval Apparatus. Once the consumer has successfully downloaded the rendered audio track it is deleted from the audio services module 200.
  • Example 5 A media consumer wishes to stream a live concert video that features matched spatialized audio, and the audio services module 200 is integrated with a subscription streaming media provider.
  • the concert footage and raw spatialization data for the audio was previously uploaded to the audio services module 200 by the media producer.
  • the concert video has been fully rendered on the audio services module 200 by the Audio Rendering Apparatus for each HRIR HRTF collection in a database stored on the server.
  • the media consumer needs to undergo the hearing profile calibration process so that the Media Retrieval Apparatus can retrieve the version of the concert video that was rendered using the HRIR HRTF collection to which the media consumer is matched.
  • the consumer navigates to the media distribution website to create a personal account and calibrate their hearing profile.
  • the calibration program is downloaded and installed on the media consumer module 300 where it guides the consumer through a series of qualitative spatialized audio comparisons.
  • the calibration program uses the consumer's feedback to determine the entry in a database of HRIR/HRTF collections that best matches their hearing profile. This matched HRIR/HRTF collection is then linked to the consumer's user profile 202.
  • the consumer navigates through the subscription service and selects the desired live concert video.
  • the audio services module 200 streams the version of the fully rendered media that was rendered using the HRIR HRTF collection that is associated with the consumer's subscription user profile.
  • a media consumer wishes to install and play a downloaded video game that features spatialized audio.
  • the Media Request Apparatus is integrated within the game installation software. This enables the installation software to retrieve the in-game soundtrack media featuring spatialized audio that was pre-rendered by the video game developers using the matched hearing profiles from the static portion of HRIR/HRTF database.
  • the media consumer installing the video game will download pre-rendered audio that has been filtered with their closest match HRIR/HRTF collection associated with their user profile. This pre-rendered media will comprise audio for the video game's passive playback segments.
  • a copy of the consumer's fully customized HRIR/HRTF collection from the dynamic portion of the HRIR/HRTF database is stored on the media consumer module 300 to enable the game's audio engine (acting as the Audio Rendering Apparatus on the media consumer module 300) to perform real-time rendering of spatialized environmental sounds during live gameplay.
  • the consumer may experience both matched spatialization added to the game's soundtrack and a fully customized spatial effect on all in-game sound effects.
  • the consumer is provided with a custom HRIR/HRTF collection after undergoing direct measurement at a 3rd-party facility.
  • the directly measured hearing profile is uploaded and linked to their online gaming account and stored on the media consumer module 300.
  • the consumer navigates through an online game distribution application and selects a game for download.
  • the game installation service is either downloaded or built in to the media consumer module 300.
  • the consumer During gameplay when a new character is created, the consumer will be prompted to link their User Profile to the game. When accepted and linked, the built- in Media Request Apparatus queries the Media Retrieval Apparatus for the Fully Rendered Media files corresponding to the soundtrack rendered using consumer's matched HRIR/HRTF collection from the static portion of the database. It will also retrieve the custom HRIR/HRTF collection previously uploaded to the media consumer's online gaming account.
  • the in-game audio engine performs real-time rendering of spatialized environmental sounds while the consumer is playing the video game.
  • Current positional data of the user's head and/or body may also be tracked through the use of hardware connected to the system.
  • the location data used to select the appropriate filters from the HRIR/HRTF collection is acquired through the consumer's interaction with the virtual game environment and the relative position of the user to the system. Meanwhile, the game soundtrack consisting of fully rendered audio files can be directly played during gameplay.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

The present technology is a system for providing ubiquitous custom spatialized audio. The server-backed system comprises an audio production module comprising a producer HRIR or HRTF data source, a producer spatialization apparatus and a producer rendering apparatus for spatialization and rendering of an audio input from a media producer to provide rendered audio input, the spatialization and rendering using producer HRIR or HRTF data and a media consumer module comprising a consumer HRIR or HRTF data source, a consumer spatialization apparatus, and a consumer rendering apparatus for spatialization and rendering of the rendered audio input to provide a rendered audio output for a media consumer, the spatialization and rendering using consumer HRIR or HRTF data, and a media distributor for distributing the rendered audio output to the media consumer.

Description

MEDIA PRODUCTION AND DISTRIBUTION SYSTEM FOR CUSTOM SPATIALIZED AUDIO
FIELD:
[0001] The present technology relates to media production and distribution apparatus.
More specifically, it relates to binaural synthesis in which computers and software are used to replicate a dual-channel playback system that captures the physical properties of how sound waves interact with the torso, head, and pinna of the human body.
BACKGROUND:
[0002] Significant research and development on the topic of free-field sound physics has provided a large body of knowledge regarding the production of spatialized (or 3D) audio. Spatial audio offers many advantages beyond conventional surround sound technology, which generally produces limited frontal and posterior ambient special effects. Currently, there are two main approaches to reproducing realistic sound environments.
[0003] The first method involves a sequence of record-replay techniques and multichannel mixing technologies called ambisonics. Typically, ambisonic techniques will utilize a multi-channel speaker array (sometimes up to 24 loud speakers) to recreate realistic 3D acoustic environments that are governed by sound physics. The theory behind this methodology is a well studied topic, and can achieve impressive spatial acuity as the number of speakers in a play-back array grows. Unfortunately, the commercial viability and adoption potential of such systems begins to decrease as these speaker arrays become more and more complex.
[0004] The second method of producing spatial audio is more akin to the manner in which free-field sound environments are experienced in the real world: with two channels. One form of this is stereo recording. "Stereo" recordings are essentially any recording made with two channels of audio, where the signal on each channel is different. [0005] In a stereo recording, two microphones spaced some reasonable distance apart are used to create a recording. As slightly different sound waves hit each microphone, slightly different sounds are recorded in each channel. When one plays the recording back, they hear a sense of space between the speakers (or headphones) which creates the stereo image.
[0006] Although stereo sound was being studied and explored as early as the 1930s by the infamous Bell Laboratories and even Walt Disney, the at-home consumer adoption ■ timeframe spanned well over 40 years between the 1940s and 1980s. The transition from mono to stereo faced many economic and marketing challenges inherent to introducing a new type of media, and more specifically, complex playback equipment into the industry. Stereo sound also required synchronized recording methods that were not widely practiced. The lesson to take away from this slow mono-to-stereo transition period is that increasing the technical complexities of record and replay techniques results in a slower adoption rate of new audio technology.
[0007] The other form is binaural recordings, or binaural synthesis. "Binaural" recordings are two channel recordings created by placing two omnidirectional microphones inside, or as close to the ears as is practical. Using this technique, the head and ear structure affect the way sound waves are picked up by the microphones so that the location information contained in the frequency, amplitude and phase responses of the left and right channels closely match the cues required by the human auditory system to localize sound sources. Positioned in this way, the microphones accurately capture sonic information coming from all directions and will produce extremely realistic recordings when listened to through headphones.
[0008] Binaural perception of a spatial environment is a direct result of having two ears and physical anatomy that reflects, refracts, and attenuates the physical properties of sound waves. The human brain is an adept processor of these miniscule differences in attenuation, frequency, and arrival times at the left and right ear. The brain receives and processes these physical variations in sound pressure and translates them into localization information. [0009] Research dating back as early as the 1990s and the 2000s began exploring the complex interactions between human anatomy and sound waves as they enter the ear canal. In some studies, two small, expensive, and highly sensitive microphones would be placed at the entrance of each ear canal of a subject. The subject would then be exposed to a myriad of controlled sound sources. Recordings for the left and right channels revealed that the interactions between a sound source and the human body, torso, and pinna acted much like an anatomically-dependent sound filter. This filtering effect is sometimes called the Anatomical Transfer Function, but in academia is more commonly referred to as an individual's Head Related Impulse Response (HRIR). The Fast Fourier Transform of the time-domain HRIR is referred to as the Head Related Transfer Function (HRTF). It should also be noted that there is a specific HRTF for the right ear as well as the left ear. The left and right HRTFs can be used to filter a dry audio track which can then be mixed into a stereo audio sample that sounds as though it is originating from a point in space when heard over a pair of standard stereo headphones.
[0010] One drawback to binaural synthesis is that an HRIR/HRTF set is highly dependent on the location of the sound source. This is due to the change in anatomical filtering effects resulting from variations in sound wave impact angles, and thus reflections of those sound waves prior to entering the ear canal. For a single source originating from a single location, an HRTF for the right ear and an HRTF for the left ear represents how the sound is anatomically filtered prior to entering each ear canal. This means that for a single person, there are unique representative HRIR/HRTFs for all points in space. To make matters more challenging, these collections of HRIR/HRTFs are highly dependent on an individual's unique anatomy.
[0011] Another drawback is that binaural synthesis mixes HRIR/HRTF data and dry audio samples to generate localization, but must additionally account for these other acoustical phenomena with software algorithms and simulated sound physics. This can lead to complex algorithms that are limited by available processing power.
[0012] In the 1990s and 2000s, it became well understood that the anatomy of the torso, head, and ear were critical parameters required for the synthesis of binaural audio. Experimental methods for determining the HRIR/HRTF collections for an individual became public knowledge, but the process generally required an expensive array of loud speakers, binaural microphone recording equipment, and human subjects to run tests on. Thus, while collected HRIR/HRTF data is capable of synthesizing stunningly accurate spatial audio for the original test subject, the same data set used for other individuals results in a poor quality of realism. For this reason, this method of HRTF/HRIR data collection was deemed to be overly complex for any possible commercial applications.
[0013] As a result of this inherent challenge, some have abandoned the complexities of custom HRIR/HRTF data collection and have adopted binaural-recording methods to capture spatial audio as it is heard in the real world. Binaural recording methods are quite similar to the manner in which HRIR/HRTF data are collected, except they do not aim to synthesize audio through digital signal processing techniques. Binaural recording methods simply place highly sensitive microphones at the left and right ear channels of an anatomically-correct dummy and record a live sound environment. The dummy is usually comprised of the upper-torso, the head, and an averaged sized pinna for the left and right ear. Headphone playback of the recordings, captured with the average-sized dummy as a subject, generally achieve impressive extemalization and localization for most people. This is partly due to the fact that the recording method does not rely on any digital signal processing techniques to replicate the many other acoustical phenomena present in a sound environment such as room reverberation, timbre, and background ambient noise. These factors greatly contribute to the realism of the sound environment.
[0014] Despite the impressive sound acuity resulting from binaural recordings, the methodology itself is quite limited. The anatomical dummy poses a cost barrier to most applications, and it is inherently limited to capturing single-shot moments that cannot be post-processed without losing the spatialization effect. Amidst a world of digital media creation, this inherently-analog method of capturing sound severely truncates the list of useful applications. Although this complex recording method stifles adoption potential, the truly impressive recordings available on the internet, such as QSound's "Virtual Barbershop" have attracted an encouraging amount of attention. The viral popularity of such recordings further suggests the entertainment potential of 3D audio record and replay techniques, but a convenient and widely accessible platform for industry media creators is necessary.
[0015] While some interested parties have resolved to use binaural recordings to capture virtual spatialization effects, there have been several critical research experiments that have led the frontier in synthetically producing binaural sound. In 1994 Stanford University's MIT Media Lab obtained a collection of impulse responses (HRIRs) using a Knowles Electronics Mannequin for Acoustics Research (KEMAR) dummy head as the experimental subject. The collection represented the HRIR/HRTF data sampled from 710 different spatial positions. The data set was made available to the academic community in order to stimulate further research and development of binaural synthesis techniques. Using this generalized HRIR/HRTF data collection, the synthesis of generalized binaural audio became more accessible to a larger body of academics.
[0016] Recent studies and research were directed towards the assessment of the quality of binaural audio synthesized with this generalized HRIR/HRTF data. In 2008, the US Army Research Laboratory based out of New Mexico strongly confirmed that generalized HRTFs do not allow for accurate localization of sound sources. Half of the participants in their experiment experienced significant localization confusion for sound sources directly in front or behind. The medial (frontal/posterior) sound localization confusion stands as one of the strongest arguments against the practical use of generalized HRTF/HRIR data. In addition to the medial confusion, accurate elevation perception and externalization are difficult to achieve consistently.
[0017] Generalized HRTF/HRIR data has served the key purpose of stimulating further research to find a method of customizing HRIR HRTF data. In 2001 a publicly- available digital HRIR/HRTF database was generated by the University of California's Center for Image Processing and Integrated Computing (CIPIC.) This database marked a significant step away from the generalized filter data acquired by MIT. The CIPIC database captured HRIR/HRTF data, as well as anthropomorphic ear measurements for 45 different subjects. This database provided the largest set of user-specific HRIR/HRTF data for public use. In 2003, the University of Maryland's Perceptual Interfaces and Reality Laboratory performed a study that used listener-specific anthropomorphic ear measurements to match a listener to a close-match HRIR/HRTF collection provided by the CIPIC database. Their experimental results determined that this ear parameter matching method improves localization in some instances. Namely, where the mean squared error between the measured ear parameters is low when compared to the anthropomorphic data associated with the matched HRIR/HRTF. This further suggests that even approximate HRIR/HRTF data is not necessarily reliable for constructing accurate spatial sound for a unique listener.
[0018] Over a decade of research has determined that generalized or approximate HRIR/HRTF data will not provide accurate localization and externalization desired by some applications. Despite this challenge, the academic community continues to explore methods of HRIR/HRTF customization. In 2013, the University of Padova in Italy conducted an experiment that relates pinna reflection patterns and HRTF features. Their research was focused on generating, what they termed as, the Pinna Related Transfer Function(PRTF). The PRTF filter could be synthesized based on the locations of prominent anatomical contours of the ear. They used this PRFT in conjunction with a filter representing the size and shape of the head to construct a custom HRTF. The synthetic front-plane HRTF was compared to an acoustically measured front-plane HRTF with errors as low as 2.38% (ranging up to 18.9%). This suggests that there is a growing body of knowledge that is capable of parameterizing ear dimensions to synthesize personalized HRTF data.
[0019] As the body of public knowledge regarding customization of HRTF/HRIR data continues to expand, applications taking advantage of customized spatialized audio becomes more and more feasible. Customized spatial audio could be useful in a vast number of virtual reality environments purposed towards entertainment or simulated training environments. Companies like Sony, Nokia, and Creative Laboratories are currently exploring applications of spatial audio technology in their products such as headphones, mobile device audio output, and personal computer audio output. The United States Federal military is exploring the use of spatial sound for military simulation training applications. Embedding custom HRTF/HRIR filter data on digital signal processing chips could enable fully customizable hearing aid development. Custom spatial audio could be used to accurately guide the blind in audio-based GPS applications, reducing their reliance on guide dogs or caregivers. [0020] Current commercial products advertising cutting-edge spatial audio, have failed to attract sustained attention due to spatial synthesis techniques that only utilize generalized HRTF data. The ability for general HRTFs to produce a realistic sound environment for an individual depends on how close an individual's own HRTF compares to the general one. For the vast majority of the population, the error between the two is significant enough to cause confusing localization cues, as well as poor exte nalization. This reduces overall product adoption rates. General HRTF data has been proven by the academic community to produce substandard spatial acuity and free-field acoustic realism. This is also supported by company after company failing to gain traction in the market due to overzealous marketing combined with mediocre production of consistent spatial realism.
[0021] The key reason for the commercialization of these one-size-fits-all spatial audio products is due to the inherent complexities of usefully integrating a multitude of HRIR/HRTF data to deliver a customized sound experience to a large market. Not only would a theoretically unlimited number of unique filters require a large amount of data storage, but delivering customized media to unique listeners would require the capacity to render (or filter) the media of interest on demand. The process of filtering a single raw audio track with HRTF/HRIR data by the process of convolution is a computationally expensive task. The expense is further increased if multiple sound sources originating from multiple locations are constituents of the spatial environment. Also, the number of sound sources and localization information required for constructing a spatial sound scene can vary immensely depending on the application, implying inconsistent task complexity levels.
[0022] A binaural system is needed for the integration of user-specific HRIR/HRTF data collections with a broad range of spatialized media and is capable of filtering any number of sound sources (and their associated locations) with any number of unique HRIR/HRTF collections. It would be advantageous if the system included a simple, (preferably hardware-free) method for users to determine their customized HRIR/HRTF data set. There would be a further advantage if a user was provided with the means for their custom HRIR/HRTF data to be applied to various audio media of interest to them, while at the same time, the system was scalable to a large number of rendering requests varying in complexity.
SUMMARY
[0023] The present technology provides a binaural system for the integration of user- specific HRIR/HRTF data collections with a broad range of spatialized media and is capable of filtering any number of sound sources and their associated locations with any number of unique HRIR/HRTF collections. The technology utilizes servers containing extensive storage capacity, and arrays of parallelized processing units (GPUs/CPUs). With these components the server will have the ability to communicate through the internet and perform complex tasks upon request. The server stores collections of HRIR/HRTF data, as well as audio data and their associated spatialization directives. The server's parallelized processing unit arrays are employed to render media data featuring customized spatial audio.
[0024] According to one aspect there is provided a system for providing custom spatialized audio. The system includes a software module that is utilized by a media producer and includes a media producer HRIR or HRTF data source that may be accessed through an internet connection and stored permanently or temporarily on a device running the software module. A software module is provided that is utilized by a media consumer and includes a consumer HRIR or HRTF data source that may be accessed through an internet connection and stored permanently or temporarily on the device running the software module. An audio apparatus enables the consumer to review and edit the spatialization of the producer rendered audio input. An audio apparatus provides a custom rendered audio output for a media consumer. The producer rendered audio input including positional data and raw audio that may be accessed through an internet connection and stored permanently or temporarily on the device running the software module. The spatialization and rendering using consumer HRIR or HRTF data is accessible for streaming via an internet-connection or downloading to a local media playback device.
[0025] According to another aspect there is provided a method for producing and distributing custom spatialized media to a user. The method includes providing at least one audio track into a spatialization system. The system accesses HRIR or HRTF data on the producer, spatializing the audio track. The system renders the audio track to provide a producer rendered audio track comprising positional data and raw audio. The system utilizes the internet for uploading and storing the rendered audio track permanently or temporarily on a server for future access by a user. The method further includes providing internet access to facilitate the user requesting the producer rendered audio track stored on the internet-accessible server and facilitate the consumer reviewing and editing the spatialization of the producer rendered audio track. The method finally includes storing permanently or temporarily a statistically approximated HRIR or HRTF as HRIR or HRTF data with respect to each user. The system accesses HRIR or HRTF data on the user stored on the internet-accessible server, spatializing the producer rendered audio track, and rendering the producer rendered audio track to produce a user rendered audio track which may be accessible for streaming via an internet-connection or downloading to a local media playback device. 26] The system and method, as described above, enables users to obtain custom spatialization of audio over the internet. This is a dramatic development in that it takes a technology that has, to this point, primarily been utilized by academics in research environments with extremely limited public access and makes it available to any member of the public having a device capable of connecting to the internet. It is envisaged that the HRIR or HRTF database will be dynamic and collect a user profile with HRIR or HRTF data from each new user. Having once used the service, with every subsequent use of the service the user will receive an audio file spatialized to correspond with the previously recorded HRIR or HRTF data. It is envisaged that the software module that is utilized will include a positioner for providing positional data to the audio apparatus. This will enable the consumer to render and review the spatial directives to provide "customization" of the spatialization of the audio output as desired. It is also envisaged the user will have the option of saving the user rendered audio track on an internet-accessible server for the purpose of future access. It will be appreciated that the use of the method and system may be extended to audio tracks associated with video. BRIEF DESCRIPTION OF THE DRAWINGS
[0027] These and other features will become more apparent from the following description in which reference is made to the appended drawings, the drawings are for the purpose of illustration only and are not intended to be in any way limiting, wherein:
[0028] FIG. 1 is a block diagram overview of the system in accordance with the present technology.
DESCRIPTION
[0029] Except as otherwise expressly provided, the following rules of interpretation . apply to this specification (written description, claims and drawings): (a) all words used herein shall be construed to be of such gender or number (singular or plural) as the circumstances require; (b) the singular terms "a", "an", and "the", as used in the specification and the appended claims include plural references unless the context clearly dictates otherwise; (c) the antecedent term "about" applied to a recited range or value denotes an approximation within the deviation in the range or value known or expected in the art from the measurements method; (d) the words "herein", "hereby", "hereof, "hereto", "hereinbefore", and "hereinafter", and words of similar import, refer to this specification in its entirety and not to any particular paragraph, claim or other subdivision, unless otherwise specified; (e) descriptive headings are for convenience only and shall not control or affect the meaning or construction of any part of the specification; and (f) "or" and "any" are not exclusive and "include" and "including" are not limiting. Further, The terms "comprising," "having," "including," and "containing" are to be construed as open-ended terms (i.e., meaning "including, but not limited to,") unless otherwise noted.
[0030] To the extent necessary to provide descriptive support, the subject matter and/or text of the appended claims is incorporated herein by reference in their entirety.
[0031] Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. Where a specific range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is included therein. All smaller sub ranges are also included. The upper and lower limits of these smaller ranges are also included therein, subject to any specifically excluded limit in the stated range.
[0032] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the relevant art. Although any methods and materials similar or equivalent to those described herein can also be used, the acceptable methods and materials are now described.
[0033] All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate the example embodiments and does not pose a limitation on the scope of the claimed invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential.
System Overview
[0034] A binaural audio production and distribution system, generally referred to as 10 is shown in Figure 1. The system 10 includes three modules: the audio production module 100, the media consumer module 300, and the audio services module 200.
Audio Production Module
[0035] The audio production module 100 is any electronic device with the capacity to utilize standard audio production software, store data locally, and send data via an internet connection, for example but not limited to personal computers or modern mobile devices. The audio production module 100 serves as an apparatus for retrieving, from the media producer 109, the raw spatialization data 105,205 required to create media files that feature customized audio spatialization. The audio production module 100 can also function as a platform for the rendering of media files featuring customized audio spatialization. Media Consumer Module
[0036] The media consumer module 300 is any device with Internet access and the capability to store or stream media files, for example, but not limited to personal computers or modern mobile devices. The media consumer module 300 is the platform from which a request for media featuring customized audio spatialization originates. The media consumer module 300 can also function as a platform for the rendering of media files featuring customized audio spatialization. Additionally, the media consumer module 300 has the ability to generate raw spatialization data using input from the media consumer, external/embedded hardware, or other software programs running on the system.
Audio Services Module
[0037] The audio services module 200 is an apparatus that contains a number of functional parts required for the rendering and delivery of media featuring spatialized audio. The audio services module 200 may include, but is not limited to, storage for:
Raw spatialization data 205;
Fully-rendered media featuring spatialized audio 206;
Database 203 containing numerous HRIR/HRTF collections that represent media consumer's or media producer's hearing profiles; and
User profiles containing or linking to an HRIR/HRTF collection representing the user's hearing profile 202.
[0038] The audio services module 200 optionally includes an Audio Rendering Apparatus 204 for the rendering of media featuring spatialized audio. The audio services module 200 services requests for media featuring spatialized audio by retrieving and transmitting:
Media that have been rendered and stored on the apparatus prior to receiving the request and/or;
Media that are rendered on demand at the time of the request and/or;
Raw spatialization data 205 to be rendered on the media consumer module
300. [0039] The audio services module 200 is implemented as a standalone hub or is integrated into an existing media distribution system.
Spatialization Apparatus
[0040] The Spatialization Apparatus 101 is a software program and/or a software program combined with external hardware that uses input from the media producer 109 to add binaural spatialization to audio tracks. The Spatialization Apparatus 101 is implemented as standalone software running on the audio production module 100 or is software executed from within an existing audio creation/editing environment.
[0041] A media producer 109 uses the Spatialization Apparatus 101 in conjunction with un-filtered audio data 107 as the first step in the process of creating spatialized binaural audio customized to the listener. The apparatus has a graphical user interface and/or an external hardware controller to represent and control the location of any sound source(s) in a virtual 3D environment. The Spatialization Apparatus 101 enables the media producer 109 to set the azimuth, elevation and distance of a dry audio track in relation to the virtual location of the listener. The virtual position of the audio track is dynamic, and the Spatialization Apparatus 101 tracks this virtual positioning over the duration of the audio track.
[0042] The Spatialization Apparatus 101 allows the media producer 109 to hear playback of the spatialization effects using their own personalized hearing profile (HRIR/HRTF Collection) 102. An HRIR/HRTF collection representing the media producer's 109 hearing profile is present on the audio production module and accessible by the Spatialization Apparatus 101. The apparatus 101 uses this HRIR/HRTF collection 102 to filter any audio tracks to which the media producer 109 has applied spatialization effects so that the output may be monitored in real-time during playback. In essence, the media producer 109 is listening to the spatialization effect as customized for himself.
[0043] The software is used on any number of audio tracks that are then mixed together during the audio rendering process with the audio rendering apparatus 104,204, 304 to form a complete binaural audio file. The purpose of the software is to produce the raw spatialization data for each audio track 105, 205 that is used as an input to the audio rendering stage.
Generated Positional Data
[0044] The generated positional data 108, 208, 308 for any particular audio track may consist of any combination of positional data, room modeling data, or headphone equalization data for each of the audio tracks comprising a complete binaural audio file. Additionally, in the consumer media module 300, the generated positional data 308 may also contain data regarding the current location of the user in reference to the media consumer module. This information, when available, may be used to modify the positional data of the each audio track.
[0045] The media consumer module 300 may generate spatialization data 308 by user inputs and/or external hardware inputs. The locally generated position data 308 can be combined with unfiltered audio tracks to form local raw spatialization data that can be input to the Audio Rendering Apparatus 304 on the media consumer module 300.
Raw Spatialization Data
[0046] The raw spatialization data 105, 205, 305 for any particular audio track consists of the unfiltered audio data 107, 207, 307 and generated positional data 108, 208, 308 correlating its location in virtual three dimensional space over the duration of the track. The generated positional data 208 indicates the location of the sound source on either a sample by sample basis or over specified sample/time intervals.
[0047] The raw spatialization data 105, 205, 305 for each audio track is used by the Audio Rendering Apparatus 204 to select the appropriate filter coefficients within an HRIR/HRTF collection 102 that are then convolved with the original audio data 110 to produce time- varying 3D spatialization. [0048] The audio services module 200 contains raw spatialization data for every media file available for download/streaming.
Audio Rendering Apparatus
[0049] The Audio Rendering Apparatus 104, 204, 304 is a software program and/or a connected external hardware system that takes the raw spatialization data 105, 205, convolves it with coefficients from an HRIR/HRTF collection 102 and outputs a binaural audio file 106, 206, 209, 306 in any standard streaming/playback format.
[0050] The standard inputs to the Audio Rendering Apparatus 104, 204, 304 are the raw spatialization data 105, 205, 305 output by the Spatialization Apparatus 101 and a single (or multiple) HRIR/HRTF collection(s) 103, 302 or the database 203. Additionally, the Audio Rendering Apparatus 104, 204, 304 can use locally generated location data 305 combined with traditional unfiltered audio 307 as inputs. Additionally, the Audio Rendering Apparatus 304 may utilize input data provided by external media 501 that consists of traditional unfiltered audio 307, and may additionally contain pre-determined positional data to provide the Audio Rendering Apparatus 304 with complete raw spatialization data 305. The output of the apparatus is a binaural 3D media file 106, 206, 209, 306 encoded in standard streaming/playback formats.
[0051] Each track that is to be spatialized within a multi-track audio stream/file is processed using filter coefficients retrieved from the same HRIR/HRTF collection 102 and selected based on the location information stored in the raw spatialization data 105. The individual spatialized audio tracks are then mixed together into a complete binaural 3D media file 106, 206, 209, 306 encoded in standard streaming/playback formats. In this manner, an audio stream/file is produced that features binaural spatialization unique to that particular HRIR HRTF collection 102. If a database of HRIR/HRTF collections 103 is available, the Audio Rendering Apparatus 204 can render an output stream/file for each HRIR/HRTF collection 103 in the database. HRIR/HRTF Calibration Apparatus
[0052] The HRIR/HRTF Calibration Apparatus 401 determines the hearing profile of media consumers 301 and media producers 109 using the audio production module 100. The output of the Calibration Apparatus 401 is a user profile 202 leading to a database 203 modeling the media consumer's hearing profile that is uploaded to and stored on the audio services module 200, or downloaded to and stored on the media consumer module 300.
[0053] The Calibration Apparatus 401 is a software program executed on the media consumer module 300, a web application served from an online resource, or a hardware solution for directly measuring the listener's hearing profile. The Calibration Apparatus 401 can either generate a custom HRIR/HRTF collection 302 corresponding to the listener's hearing profile (HRIR/HRTF collection generation), or match the listener to an existing HRIR HRTF collection in the database 203 that closely models their hearing profile (HRIR HRTF collection matching). The Calibration Apparatus 401 utilizes various user inputs to determine an appropriate HRIR/HRTF collection that models the user's unique perception of sound. These inputs include but are not limited to:
Anthropometric measurements of the user's head and ears obtained through manual measurement;
Anthropometric measurements of the user's head and ears obtained through digital images;
User feedback obtained through an interactive process (for example but not limited to user performance while playing a game dependant on spatial audio cues);
User feedback obtained through comparison and modification of HRIR/HRTF collections; and
Direct measurement of an HRIR/HRTF collection representing the user's hearing profile.
[0054] The calibration process either matches the user to an HRIR/HRTF collection from the database 203 that closely models their hearing profile or generates a unique HRIR/HRTF collection 302 specific to that user. [0055] The result of the calibration process is an HRIR/HRTF collection 302 to be used by the audio production module 100 for providing spatialized audio specific to the calibrated user. A reference to the HRIR/HRTF collection 302 is saved to the user's profile 202 for later use in on-demand media rendering or retrieval of pre-rendered media. Alternatively, it could be saved directly to the media consumer module 300.
Archive of HRIR/HRTF Collections
[0056] The database 203 of HRIR/HRTF Collections is a number of HRIR/HRTF collections 102, 302 each corresponding to a unique hearing profile stored on the audio services module 200 and the audio production module 100. A portion of the database can be static, and may contain a fixed number of HRIR HRTF collections 102, 302. The Calibration Apparatus 401 uses this static portion to determine a media consumer's hearing profile by matching them to their best fit HRIR/HRTF collection from the database 203. The database 203 stored on the audio production module 100 consists of the portion of the HRIR/HRTF database which is static.
[0057] The database can also consist of custom HRIR/HRTF collections output by the Calibration Apparatus 401 or uploaded by the media consumer 301. This portion of the database contains at least one HRIR/HRTF collection for each media consumer 301 that has had a fully customized hearing profile created after undergoing calibration or has uploaded a previously obtained HRIR/HRTF collection. The size of this portion of the database is dynamic and increases as each new customized hearing profile is stored.
User Profiles
[0058] A User Profile 202 is a collection of user data that may contain a generated listening profile represented by a customized HRIR/HRTF collection which would constitute an entry in the dynamic portion of the database of HRIR/HRTF collections, and/or a link to a matched HRIR HRTF collection stored within the static portion of the database on the audio services module 200. The user data is uploaded to the User Profile 202 by the described Calibration Apparatus or obtained through other third party means and uploaded by the media consumer 301. A User Profile 202 is stored on the audio services module 200 for each media consumer 301 that wishes to request media featuring spatialized audio. Fully-Rendered Media
[0059] The fully-rendered media 106, 206, consists of media featuring spatialized audio that is stored on the audio services module 200. The fully rendered media are either output from the Audio Rendering Apparatus 204 present on the audio services module 200, or output from the Audio Rendering Apparatus 104 and then transmitted to the audio services module 200. Each media file that is available to be downloaded/streamed from the audio services module 200 is rendered once for every HRIR/HRTF collection in a static database and stored as fully-rendered media. Additionally, the Audio Rendering Apparatus 304 found on the media consumer module 300 may output fully rendered media 306 specifically for the media consumer 301. This particular rendered media will be stored locally on the media consumer's system.
Media Retrieval Apparatus and Spatialization Data Retrieval Apparatus
[0060] The Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 are implemented as part of the audio services module 200. The two apparatuses are implemented as a single functioning unit but are shown separately in Figure 1 for increased clarity. They receive and serve the request from the media consumer 301 for media featuring spatialized audio. The initial input to the Media Retrieval Apparatus 201 and Raw Spatialization Data Retrieval Apparatus 210 is a request originating from the media consumer module 300 for media featuring spatialized audio. The Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 analyzes the request to determine what action(s) are to be taken by the audio services module 200.
[0061] The Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 queries the User Profiles 202 stored on the audio services module 200 to retrieve the HRIR/HRTF collection(s) representing each media consumer's 301 hearing profile. It can access and transmit the media that was previously rendered and stored on the audio services module 200. Additionally, it can access and transmit the raw spatialization data 205 stored on the audio services module 200, and it can initiate the on-demand rendering process performed by the Audio Rendering Apparatus 304 using either previously stored positional data 208 or user-generated positional data 308. The Media Retrieval Apparatus 201 and Spatialization Data Retrieval Apparatus 210 would then transmit the rendered media.
[0062] If the request is determined to be for media that was previously rendered and stored on the audio services module 200 then the Media Retrieval Apparatus 201 queries the requestor's User Profile 202 to determine the HRIR/HRTF collection matched to the media consumer 301. The Media Retrieval Apparatus 201 then transmits to the media consumer 301 the version of the requested media that was rendered with that particular HRIR/HRTF Collection.
[0063] If the request is for media that needs to be rendered on demand, the Media Retrieval Apparatus 201 retrieves the user's HRIR/HRTF collection associated with the requesting User Profile 202. This HRIR/HRTF collection along with the raw spatialization data 205, 305 for the requested media are input to the Audio Rendering Apparatus 204, 304 and the media is rendered. The Media Retrieval Apparatus then transmits to the media consumer the media rendered with their specific HRIR/HRTF collection.
[0064] If the request is for media with pre-determined spatialization data 210 that is to be rendered on the media consumer module 300, the Media Retrieval Apparatus 201 retrieves and transmits to the media consumer 301 the raw spatialization data 205 for the requested media.
[0065] If the request is for media with user-generated positional data 308 that is to be rendered on the media consumer module 300, the Media Retrieval Apparatus 201 retrieves and transmits to the media consumer 301 the unfiltered audio data 207 for the requested media.
HRIR/HRTF Collection
[0066] The HRIR/HRTF collection 102, 302 illustrated in Figure 1 on both the audio production module 100 and media consumer module 300 represents a single HRIR/HRTF collection representing the media producer's 109 and media consumer's 301 hearing profiles respectively. The HRIR/HRTF collection 102 on the audio production module 100 is used by the Spatialization Apparatus 101 to allow the media producer 109 to monitor the spatialized audio they are creating with their own hearing profile. The HRIR/HRTF collection 302 on the media consumer module 300 is used as an input to the Audio Rendering Apparatus 204 for any media rendering performed on the media consumer module 300.
Media Producer
[0067] The media producer 109 is any entity using an audio production module 100 to create media featuring spatialized audio.
Media with Unfiltered Audio
[0068] Media with Unfiltered Audio Data 107, 207, 307 is defined as any media containing audio that has not undergone spatialization and will be used as an input to either the Spatialization Apparatus 101 or the Audio Rendering Apparatus 104.
Media Consumer
[0069] The media consumer 301 is any entity using a media consumer module 300 to request and/or listen to media featuring spatialized audio.
Media Request
[0070] A media request 303 is any request originating from a media consumer 301 to the audio services module 200 for media featuring spatialized audio.
OPERATION
[0071] The operation of the system can vary depending on a number of parameters including:
[0072] the manner in which the media featuring spatialized audio are rendered;
[0073] the manner in which the media are served; and
[0074] the method of obtaining the media consumer' s user profile 202 employed by the Calibration Apparatus 401. Operation of the Audio Rendering Apparatus
[0075] By way of example, a suitable exemplary operation of the audio rendering apparatus comprises the following:
[0076] The media rendering is performed by the Audio Rendering Apparatus 104 functioning on the Audio Production Module 100. Multiple media files are rendered and then uploaded to the audio services module 200. The number of audio files generated is dependent on the total number of provided HRIR/HRTF collections in the HRIR HRTF database stored on the audio production module 100 (i.e. a unique media file is produced for each HRIR/HRTF collection within the database).The rendered media is then be packaged, transmitted and stored within the audio services module 200 from which they can be streamed/downloaded.
[0077] The media rendering is performed by the Audio Rendering Apparatus functioning on the audio services module 200. The raw spatialization data 105 output from the Spatialization Apparatus 101 is transmitted from the audio production module 100 and stored on the audio services module 200. The media rendering is then performed within the audio services module 200 using server resources to fully render multiple media files. The number of media files generated is dependent on the total number of collections in the static portion of the database of HRIR/HRTF collections 203 (i.e. a unique media file is produced for each entry within the database). The entire collection of rendered media files is then be stored within the audio services module 200 from which they can be streamed/downloaded.
[0078] During the retrieval of an on-demand request, the media rendering is performed by the Audio Rendering Apparatus functioning on the audio services module 200. The raw spatialization data 205 for the requested media has been previously transmitted from the Media Production System and is stored within the audio services module 200. In this scenario, only one media file is rendered which corresponds to a specific HRIR/HRTF collection associated with the hearing profile of the media consumer issuing the request. Upon receiving the on-demand request, the Media Retrieval Apparatus retrieves the appropriate raw spatialization data as well as media consumer's HRIR/HRTF collection and passes these inputs to the Audio Rendering Apparatus. The media are then rendered and transmitted to the requestor. [0079] In another scenario, the media rendering may be performed on-demand by the Audio Rendering Apparatus functioning on the media consumer module 300 using local resources. The rendering is performed using raw spatialization data that will have been transmitted from the audio services module 200 via the Spatialization Data Retrieval Apparatus 210. The media rendering is performed using a HRIR/HRTF collection corresponding to the media consumer's hearing profile that is either stored locally on the media consumer module 300 or downloaded from the audio services module 200 at the time of request. After the media has been rendered it is available for playback or storage on the media consumer module 300.
[0080] The media rendering is performed on-demand by the Audio Rendering Apparatus functioning on the media consumer module 300 using local resources. The rendering is performed using unfiltered audio tracks and positional data that is generated locally either on the media consumer module 300 or hardware attached to the media consumer module 300. The media rendering is performed using a HRIR/HRTF collection corresponding to the media consumer's hearing profile that is stored locally on the media consumer module 300. After the media has been rendered it is available for playback or storage on the media consumer module 300.
Operation of the Calibration Apparatus
[0081] By way of example, a suitable exemplary operation of the calibration apparatus comprises the following:
[0082] Anthropometric measurements of the user's head and ears are obtained manually by the user and input to the Calibration Apparatus. The anthropometric data is then used by the Calibration Apparatus to determine an HRIR/HRTF collection from a database of collections that best matches the user. This matched HRIR/HRTF collection would have corresponding anthropometric measurements with the highest possible correlation to the input data and is sourced from an existing database stored on the audio services module 200. [0083] Digital images of the user's head and ears are input into the Calibration Apparatus which then obtains the required anthropometric measurements from the images. A unique HRIR HRTF collection specific to the user will then be generated using the obtained measurements and/or the user will be matched to an HRIR/HRTF collection as described above. If a customized HRIR/HRTF collection is generated, it will then be uploaded to the dynamic portion of the HRIR/HRTF database on the audio services module 200 and linked to the User Profile. Similarly, if matching to an existing HRIR/HRTF collection was performed, the existing collection (stored in the static portion of the database) will also be linked to the User Profile.
[0084] The Calibration Apparatus obtains user feedback through an interactive process and uses the feedback to subjectively determine the best performing HRIR/HRTF collection from a database of collections (performance matching). This matched HRIR/HRTF collection has corresponding anthropometric measurements with the highest possible correlation to the input data and is sourced from an existing database stored on the audio services module 200.
[0085] The Calibration Apparatus obtains user feedback through an interactive process and uses this feedback to generate a customized HRIR/HRTF collection by modifying an existing HRIR/HRTF collection that more closely represents the user's hearing profile. The result of this method is an HRIR/HRTF collection unique to the user. When the customized HRIR/HRTF collection is generated, it is uploaded to the dynamic portion of the HRIR/HRTF database on the audio services module 200 and linked to the User Profile.
[0086] The user's HRIR/HRTF collection is obtained through direct measurement. Through front-end software, this customized HRIR/HRTF collection is then uploaded by the user to the dynamic portion of the HRIR/HRTF database on the audio services module 200 and linked to the User Profile.
Operation of the Media Request Apparatus
[0087] By way of example, a suitable exemplary operation of the media request apparatus comprises the following: [0088] The fulfillment of a media request from the user is accomplished by the audio services module 200 in multiple ways, depending on the nature of the media request.
[0089] The media request can be serviced by providing rendered media directly to the user. This rendered media will have been previously processed by the Audio Rendering Apparatus 104, 204 and is stored within the audio services module 200 in the rendered media form. The rendered media are selected for the user by cross-referencing the HRIR/HRTF collection linked to the User Profile during the calibration process to the corresponding media file rendered with that specific HRIR/HRTF collection.
[0090] The media request can be serviced by "on-demand" rendering within the audio services module 200. In this case, the media are rendered by the Audio Rendering Apparatus using the corresponding raw spatialization data and the HRIR/HRTF collection linked to the User Profile, both of which are stored on the audio services module 200. Once the media file has been rendered, it is transmitted to the media requestor.
[0091] The media request is serviced by "on-demand" rendering performed on the media consumer module 300. In this case, upon receiving a media request the audio services module 200 transmits the raw spatialization data for the requested media to the Audio Rendering Apparatus located on the media consumer module 300. An HRIR/HRTF collection corresponding to the media requestors hearing profile will also be transmitted from the audio services module 200. Using the above inputs, the media are then rendered by the Audio Rendering Apparatus on the media consumer module 300, and are available for playback/streaming to the media consumer.
[0092] The media request is serviced by "on-demand" rendering performed on the media consumer module 300. In this case, upon receiving a media request the audio services module 200 transmits the raw spatialization data for the requested media to the Audio Rendering Apparatus located on the media consumer module 300. Along with the raw spatialization data, one or more HRIR/HRTF collections corresponding to the user are gathered from internal storage on the media consumer module 300. Using the above inputs, the media are then rendered by the Audio Rendering Apparatus on the media consumer module 300, and are available for playback/streaming to the media consumer.
Operation of the System
[0093] The present technology can be explained by way of the following exemplary examples.
Example 1:
[0094] A media producer wants to create a song featuring spatialized audio that may be downloaded by media consumers. The consumer will be able to download a version of the song that is specific to their hearing profile. In this way, the media producer is ensured that the consumer will hear the song as intended. The song is uploaded from an audio production module 100 (that includes the audio production module) to the audio services module in the form of Raw Spatialization Data.
[0095] The media producer is creating the song within a Digital Audio Workstation (DAW) that supports 3rd party audio software.
[0096] The media producer undergoes the hearing profile calibration process before using the Spatialization Apparatus. The media producer uses their mobile phone to capture images of their head and ears. The calibration procedure is run as an application on their mobile device, and performs matching to link an HRIR/HRTF collection representing their hearing profile to a User Profile on the audio services module.
[0097] The media producer downloads their hearing profile from the audio services module, and attaches it to the Spatialization Apparatus. This allows the producer to monitor the spatialization effects added to each audio track using their specific HRIR/HRTF collection.
[0098] The media producer uses the Spatialization Apparatus to apply binaural spatial effects to the desired audio tracks. The digital audio workstation allows the producer to monitor the final mix of the multi-track song when all of the individually spatialized tracks are played together.
[0099] When satisfied with the song, the media producer will export the raw spatialization data 101 for the song from their system to the audio services module 200. The raw spatialization data is comprised of the generated positional data 108 and unfiltered audio data 107. This exporting/uploading step could be performed within the Spatialization Apparatus or using a standalone application that has access to the raw spatialization data.
[00100] Once the uploading of the raw spatialization data is complete, the song is available to be downloaded/streamed from the audio services module 200. Access to the media is made available to users through on-demand rendering provided by the Audio Rendering Apparatus 204, which utilizes HRIR/HRTF data associated with the user profile 202 generating the request.
Example 2:
[00101] A media producer wants to create a video featuring spatialized audio that may viewed by media consumers. The consumer will be able to download or stream a version of the video that features spatialized audio specific to their hearing profile. In this way, the media producer is ensured that the consumer will hear the audio in the video playback as is intended. The video is uploaded from an audio production module 100 to the audio services module 200 in the form of fully rendered video files.
[00102] The media producer is editing the video's accompanying audio within a Digital Audio Workstation (DAW) that supports 3rd party audio software.
[00103] The media producer undergoes the hearing profile calibration process before using the Spatialization Apparatus. The producer uses his/her personal computer to play an interactive game that incorporates user feedback to determine the HRIR/HRTF collection that best matches their hearing profile. This HRIR/HRTF collection representing their hearing profile is linked to a user profile 202 on the audio services module 200. [00104] The media producer downloads a file containing their hearing profile from the audio services module 200, and attaches it to the Spatialization Apparatus. This allows the producer to monitor the spatialization effects added to each audio track using their specific HRTR/HRTF collection.
[00105] The media producer uses the Spatialization Apparatus to apply binaural spatial effects to the desired audio tracks. The DAW allows the producer to monitor the final mix of the multi-track audio file when the individually spatialized tracks are played together.
[00106] When satisfied with the audio track, the Audio Rendering Apparatus on the audio production module 100 will be used to render a number of unique copies of the media's audio track. The rendering program will have access to the Raw Spatialization Data output by the Spatialization Apparatus and a database of static HRIR/HRTF collections stored on the Media Production System. A version of the audio track will be rendered for each entry in the database of HRIR/HRTF collections.
[00107] After the rendering process is complete, the fully-rendered media files will be sent to the audio services module where it will then be available for download/streaming by a media consumer.
Example 3:
[00108] A media producer is creating the soundtrack for a PC video game. When playing the video game the accompanying soundtrack will be spatialized using the HRIR/HRTF collection corresponding to a media consumer's profile. In addition, the PC game production company has also opted to include accompanying spatialized sound effects during gameplay (which will also be filtered using the HRIR/HRTF collection corresponding to the media consumer's profile). These spatialized sound effects will have their position vectors and audio generated during gameplay on the media consumer module 300. In this way the game producer is ensured that the consumer will hear all audio in the game, including the correct positional cues from sound effects, as is intended. The game may be sold via any retail methods, and must be linked during installation to the consumer's hearing profile.
[00109] The media producer is editing the game's accompanying music within a Digital Audio Workstation (DAW) that supports 3rd party audio software.
[00110] The media producer undergoes the hearing profile calibration process before using the spatialization software. The producer measures and submits to the Calibration Apparatus all necessary anthropomorphic measurements required to determine the HRIR/HRTF collection that best matches their hearing profile. This HRIR/HRTF collection representing their hearing profile is linked to a User Profile on the audio services module 200.
[00111] The media producer downloads their hearing profile from the audio services module 200, and attaches it to the spatialization software. This allows the producer to monitor the spatialization effects added to each audio track using their specific HRIR/HRTF collection.
[00112] The media producer uses the spatialization software to apply binaural spatial effects to the desired audio stems. The DAW allows the producer to monitor the final mix of the multi-track audio file when the individually spatialized tracks are played together.
[00113] When the producer is satisfied with the audio track, the Audio Rendering Apparatus 104 running on the audio production module 100 will be used to render a number of unique copies of the media's audio track. The rendering program will have access to the raw spatialization data output by the Spatialization Apparatus and a database of HRIR/HRTF collections stored on the Media Production System. A version of the audio track will be rendered for each entry in the database of HRIR/HRTF collections.
[00114] After the rendering process is complete, the standalone program will export/upload the collection of fully rendered audio files to the audio services module 200 where it will then be available for download by a media consumer during the PC game installation.
Example 4:
[00115] A media consumer wishes to download a customized spatial audio track. The audio services module 200 is integrated within a publicly accessible distribution website. In this case, the media requested will be rendered on-demand by the Audio Rendering Apparatus using the raw spatialization data and the consumer's specified HRJR HRTF collection, all of which are stored on the audio services module 200. Once the rendered media file is successfully downloaded, the media consumer may then playback the customized audio track with any consumer electronic device which supports standard audio output formats. Based on the consumer's selections during the download process, the audio track may be optimized for spatial effects while listening with either conventional speaker set-ups or any standard pair of headphones/earbuds.
[00116] The consumer navigates to the media distribution website to create a personal account and to calibrate their hearing profile. Using their webcam, photos are taken of the consumer's head at a requested number of different angles. These photos are then submitted to the online Calibration Apparatus 401, which derives the necessary anthropometric measurements from the images to generate a custom HRIR HRTF collection representing the consumer's hearing profile.
[00117] Upon selection and/or purchase of the desired audio track by the consumer, the Media Retrieval Apparatus retrieves and inputs to the Audio Rendering Apparatus the consumer's custom HRIR/HRTF collection and the pre-filtered spatialization data for the song.
[00118] The custom audio track is rendered in the requested audio file format and transmitted to the media consumer by the Media Retrieval Apparatus. Once the consumer has successfully downloaded the rendered audio track it is deleted from the audio services module 200.
Example 5 : [00119] A media consumer wishes to stream a live concert video that features matched spatialized audio, and the audio services module 200 is integrated with a subscription streaming media provider. The concert footage and raw spatialization data for the audio was previously uploaded to the audio services module 200 by the media producer. Using the uploaded data, the concert video has been fully rendered on the audio services module 200 by the Audio Rendering Apparatus for each HRIR HRTF collection in a database stored on the server. The media consumer needs to undergo the hearing profile calibration process so that the Media Retrieval Apparatus can retrieve the version of the concert video that was rendered using the HRIR HRTF collection to which the media consumer is matched.
[00120] The consumer navigates to the media distribution website to create a personal account and calibrate their hearing profile.
[00121] The calibration program is downloaded and installed on the media consumer module 300 where it guides the consumer through a series of qualitative spatialized audio comparisons. The calibration program uses the consumer's feedback to determine the entry in a database of HRIR/HRTF collections that best matches their hearing profile. This matched HRIR/HRTF collection is then linked to the consumer's user profile 202.
[00122] The consumer navigates through the subscription service and selects the desired live concert video. Upon initiation of playback the audio services module 200 streams the version of the fully rendered media that was rendered using the HRIR HRTF collection that is associated with the consumer's subscription user profile.
Example 6:
[00123] A media consumer wishes to install and play a downloaded video game that features spatialized audio. The Media Request Apparatus is integrated within the game installation software. This enables the installation software to retrieve the in-game soundtrack media featuring spatialized audio that was pre-rendered by the video game developers using the matched hearing profiles from the static portion of HRIR/HRTF database. The media consumer installing the video game will download pre-rendered audio that has been filtered with their closest match HRIR/HRTF collection associated with their user profile. This pre-rendered media will comprise audio for the video game's passive playback segments. Additionally, a copy of the consumer's fully customized HRIR/HRTF collection from the dynamic portion of the HRIR/HRTF database is stored on the media consumer module 300 to enable the game's audio engine (acting as the Audio Rendering Apparatus on the media consumer module 300) to perform real-time rendering of spatialized environmental sounds during live gameplay. In this way, the consumer may experience both matched spatialization added to the game's soundtrack and a fully customized spatial effect on all in-game sound effects.
[00124] The consumer is provided with a custom HRIR/HRTF collection after undergoing direct measurement at a 3rd-party facility. The directly measured hearing profile is uploaded and linked to their online gaming account and stored on the media consumer module 300.
[00125] The consumer navigates through an online game distribution application and selects a game for download. The game installation service is either downloaded or built in to the media consumer module 300.
[00126] During gameplay when a new character is created, the consumer will be prompted to link their User Profile to the game. When accepted and linked, the built- in Media Request Apparatus queries the Media Retrieval Apparatus for the Fully Rendered Media files corresponding to the soundtrack rendered using consumer's matched HRIR/HRTF collection from the static portion of the database. It will also retrieve the custom HRIR/HRTF collection previously uploaded to the media consumer's online gaming account.
[00127] The in-game audio engine performs real-time rendering of spatialized environmental sounds while the consumer is playing the video game. Current positional data of the user's head and/or body may also be tracked through the use of hardware connected to the system. The location data used to select the appropriate filters from the HRIR/HRTF collection is acquired through the consumer's interaction with the virtual game environment and the relative position of the user to the system. Meanwhile, the game soundtrack consisting of fully rendered audio files can be directly played during gameplay.
[00128] Advantages of the exemplary embodiments described herein may be realized and attained by means of the instrumentalities and combinations particularly pointed out in this written description. It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims below. While example embodiments have been described in detail, the foregoing description is in all aspects illustrative and not restrictive. It is understood that numerous other modifications and variations can be devised without departing from the scope of the example embodiment.
[00129] The scope of the claims should not be limited by the illustrated embodiments set forth as examples, but should be given the broadest interpretation consistent with a purposive construction of the claims in view of the description as a whole.

Claims

I CLAIM:
1. A system for providing custom spatialized audio, the system comprising:
a software module that is utilized by a media producer comprising a media producer HRIR or HRTF data source that may be accessed through an internet connection and stored permanently or temporarily on the device running the software module;
a software module that is utilized by a media consumer comprising a consumer HRIR or HRTF data source that may be accessed through an internet connection and stored permanently or temporarily on the device running the software module;
an audio apparatus that enables the consumer to review and edit the spatialization of the producer rendered audio input; and
an audio apparatus that provides a custom rendered audio output for a media consumer, the producer rendered audio input comprising positional data and raw audio that may be accessed through an internet connection and stored permanently or temporarily on the device running the software module, the spatialization and rendering using consumer HRIR or HRTF data which may be accessible for streaming via an internet-connection or downloading to a local media playback device.
2. The system of Claim 1, wherein a software module enables the producer to set the spatial locations of audio and an audio apparatus that enables the producer to render and review the spatial directives for an audio input from a media producer to provide a rendered audio input, the rendered audio input comprising positional data and raw audio that may be uploaded through an internet connection and stored permanently or temporarily on an internet-accessible server for future access, the spatialization and rendering using producer HRIR or HRTF data.
3. The system of claim 1 , wherein the HRIR or HRTF database is dynamic and collects a media consumer profile comprising HRIR or HRTF data from each new user.
4. The system of claim 3, further comprising a software module that provides web-based audio services, the module comprising storage for rendered audio input comprising positional data and raw audio and a HRIR or HRTF database accessible to both the software module that is utilized by the media producer and the software module that is utilized by the media consumer for spatialization and rendering to produce a user rendered audio track.
5. The system of claim 1, wherein the software module that is utilized by a media consumer further comprises a positioner for providing positional data on the media consumer to the audio apparatus that enables the consumer to render and review the spatial directives.
6. A method for producing and distributing custom spatialized media to a user, the method comprising:
providing at least one audio track into a spatialization system, the system accessing HRIR or HRTF data on the producer, the system spatializing the audio track, and the system rendering the audio track to provide a producer rendered audio track comprising positional data and raw audio, the system utilizing the internet for uploading and storing the rendered audio track permanently or temporarily on a server for future access by a user;
providing internet access to facilitate the user requesting the producer rendered audio track stored on the internet-accessible server and facilitate the consumer reviewing and editing the spatialization of the producer rendered audio track; and
storing permanently or temporarily a statistically approximated HRIR or HRTF as HRIR or HRTF data with respect to each user, the system accessing HRIR or HRTF data on the user stored on the internet-accessible server, the system spatializing the producer rendered audio track, and the system rendering the producer rendered audio track, to produce a user rendered audio track for streaming via an internet-connection or downloading to a local media playback device.
7. The method of claim 6, further comprising storing the user rendered audio track for future access.
8. The method of claim 6, wherein video is associated with the at least one audio track.
PCT/CA2014/000603 2013-08-05 2014-08-01 Media production and distribution system for custom spatialized audio WO2015017914A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361862193P 2013-08-05 2013-08-05
US61/862,193 2013-08-05

Publications (1)

Publication Number Publication Date
WO2015017914A1 true WO2015017914A1 (en) 2015-02-12

Family

ID=52460443

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2014/000603 WO2015017914A1 (en) 2013-08-05 2014-08-01 Media production and distribution system for custom spatialized audio

Country Status (1)

Country Link
WO (1) WO2015017914A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10306396B2 (en) 2017-04-19 2019-05-28 United States Of America As Represented By The Secretary Of The Air Force Collaborative personalization of head-related transfer function
US10535355B2 (en) 2016-11-18 2020-01-14 Microsoft Technology Licensing, Llc Frame coding for spatial audio data
US10839545B2 (en) 2016-03-15 2020-11-17 Ownsurround Oy Arrangement for producing head related transfer function filters
US10937142B2 (en) 2018-03-29 2021-03-02 Ownsurround Oy Arrangement for generating head related transfer function filters
US11026039B2 (en) 2018-08-13 2021-06-01 Ownsurround Oy Arrangement for distributing head related transfer function filters
CN113821190A (en) * 2021-11-25 2021-12-21 广州酷狗计算机科技有限公司 Audio playing method, device, equipment and storage medium
CN115023958A (en) * 2019-11-15 2022-09-06 博姆云360公司 Dynamic rendering device metadata information audio enhancement system
EP4262241A4 (en) * 2020-12-09 2024-05-08 Sony Group Corp Reproduction apparatus, reproduction method, information processing apparatus, information processing method, and program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008008730A2 (en) * 2006-07-08 2008-01-17 Personics Holdings Inc. Personal audio assistant device and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008008730A2 (en) * 2006-07-08 2008-01-17 Personics Holdings Inc. Personal audio assistant device and method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839545B2 (en) 2016-03-15 2020-11-17 Ownsurround Oy Arrangement for producing head related transfer function filters
US11823472B2 (en) 2016-03-15 2023-11-21 Apple Inc. Arrangement for producing head related transfer function filters
US10535355B2 (en) 2016-11-18 2020-01-14 Microsoft Technology Licensing, Llc Frame coding for spatial audio data
US10306396B2 (en) 2017-04-19 2019-05-28 United States Of America As Represented By The Secretary Of The Air Force Collaborative personalization of head-related transfer function
US10937142B2 (en) 2018-03-29 2021-03-02 Ownsurround Oy Arrangement for generating head related transfer function filters
US11026039B2 (en) 2018-08-13 2021-06-01 Ownsurround Oy Arrangement for distributing head related transfer function filters
US20230232179A1 (en) * 2018-08-13 2023-07-20 Apple Inc. Arrangement for distributing head related transfer function filters
CN115023958A (en) * 2019-11-15 2022-09-06 博姆云360公司 Dynamic rendering device metadata information audio enhancement system
EP4262241A4 (en) * 2020-12-09 2024-05-08 Sony Group Corp Reproduction apparatus, reproduction method, information processing apparatus, information processing method, and program
CN113821190A (en) * 2021-11-25 2021-12-21 广州酷狗计算机科技有限公司 Audio playing method, device, equipment and storage medium
CN113821190B (en) * 2021-11-25 2022-03-15 广州酷狗计算机科技有限公司 Audio playing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US9131305B2 (en) Configurable three-dimensional sound system
Zhang et al. Surround by sound: A review of spatial audio recording and reproduction
WO2015017914A1 (en) Media production and distribution system for custom spatialized audio
CN109644314B (en) Method of rendering sound program, audio playback system, and article of manufacture
CN101263741B (en) Method of and device for generating and processing parameters representing HRTFs
Jot et al. Augmented reality headphone environment rendering
CN111294724B (en) Spatial repositioning of multiple audio streams
CN112005559B (en) Method for improving positioning of surround sound
US8638946B1 (en) Method and apparatus for creating spatialized sound
Lindau Binaural resynthesis of acoustical environments: Technology and perceptual evaluation
EP3777249A1 (en) An apparatus, a method and a computer program for reproducing spatial audio
Comanducci Intelligent networked music performance experiences
WO2012104297A1 (en) Generation of user-adapted signal processing parameters
Yuan et al. Sound image externalization for headphone based real-time 3D audio
Lindau Binaural resynthesis of acoustical environments
Sunder et al. Virtual Studio Production Tools with Personalized Head Related Transfer Functions for Mixing and Monitoring Dolby Atmos and Multichannel Sound
Grond et al. Spaced AB placements of higher-order Ambisonics microphone arrays: Techniques for recording and balancing direct and ambient sound
Vorländer Virtual acoustics: opportunities and limits of spatial sound reproduction
San Martín et al. Influence of recording technology on the determination of binaural psychoacoustic indicators in soundscape investigations
Zea Binaural In-Ear Monitoring of acoustic instruments in live music performance
Surdu et al. A. LI. EN: An Audiovisual Dataset of different Acoustical Impulse Responses Measured in a Living Room Environment
Rudzki Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics
Ballivian Creating, Capturing and Conveying Spatial Music: An Open-Source Approach
Angelucci et al. Binaural spatialization: Comparing head related transfer function models for use in virtual and augmented reality applications
WO2023173285A1 (en) Audio processing method and apparatus, electronic device, and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14833825

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 20.05.2016)

122 Ep: pct application non-entry in european phase

Ref document number: 14833825

Country of ref document: EP

Kind code of ref document: A1