WO2015023685A1 - Système et procédé de production d'audio paramétrique multidimensionnel - Google Patents

Système et procédé de production d'audio paramétrique multidimensionnel Download PDF

Info

Publication number
WO2015023685A1
WO2015023685A1 PCT/US2014/050759 US2014050759W WO2015023685A1 WO 2015023685 A1 WO2015023685 A1 WO 2015023685A1 US 2014050759 W US2014050759 W US 2014050759W WO 2015023685 A1 WO2015023685 A1 WO 2015023685A1
Authority
WO
WIPO (PCT)
Prior art keywords
listener
adjusted
sound
hrtf
audio
Prior art date
Application number
PCT/US2014/050759
Other languages
English (en)
Inventor
Richard Joseph KULAVIK
Elwood Grant NORRIS
Brian Alan KAPPUS
Original Assignee
Turtle Beach Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/969,292 external-priority patent/US20140050325A1/en
Application filed by Turtle Beach Corporation filed Critical Turtle Beach Corporation
Priority claimed from US14/457,588 external-priority patent/US9271102B2/en
Publication of WO2015023685A1 publication Critical patent/WO2015023685A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2217/00Details of magnetostrictive, piezoelectric, or electrostrictive transducers covered by H04R15/00 or H04R17/00 but not provided for in any of their subgroups
    • H04R2217/03Parametric transducers where sound is generated or captured by the acoustic demodulation of amplitude modulated ultrasonic waves
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation

Definitions

  • the present invention relates generally to audio systems, and more particularly, some embodiments relate to multi-dimensional audio processing for ultrasonic audio systems.
  • Surround sound or audio reproduction from various positions about a listener can be provided using several different methodologies.
  • One technique uses multiple speakers encircling the listener to play audio from different directions.
  • An example of this is Dolby® Surround Sound, which uses multiple speakers to surround the listener.
  • the Dolby 5.1 process digitally encodes five channels (plus a subwoofer) of information onto a digital bitstream. These are the Left Front, Center Front, Right Front, Surround Left, and a Surround Right. Additionally, a Subwoofer output is included (which is designated by the ".1").
  • a stereo amplifier with Dolby processing receives the encoded audio information and decodes the signal to derive the 5 separate channels. The separate channels are then used to drive five separate speakers (plus a subwoofer) placed around the listening position.
  • Dolby 6.1 and Dolby 7.1 are extensions of Dolby 5.1.
  • Dolby 6.1 includes a Surround Back Center channel.
  • Dolby 7.1 adds left and right back speakers that are preferably placed behind the listening position and the surround speakers are set to the sides of the listening position. An example of this is provided in FIG. 1.
  • the conventional Dolby 7.1 system includes Left Front (LF), Center, Right Front (RF), Left Surround (LS), Right Surround (RS) Back Surround Left (BSL) and Back Surround Right (BSR). Additionally, a Subwoofer, or Low Frequency effects (LFE), is shown.
  • LFE Low Frequency effects
  • decoders at the audio amplifier decode the encoded information in the audio stream and break up the signal into its constituent channels - e.g., 7 channels plus a subwoofer output for Dolby 7.1.
  • the separate channels are amplified and sent to their respective speakers.
  • Dolby 7.1 and other multi-speaker surround sound systems require more than two speakers.
  • multi-speaker surround sound systems require placement of the speakers around the listening environment. These requirements can lead to increased cost, additional wiring and practical difficulties with speaker placement.
  • the sound created by the conventional speakers is always produced on the face of the speaker (i.e., at the speaker cone). The sound wave created at the surface propagates through the air in the direction at which the speaker is pointed.
  • the sound will appear to be closer or farther away from the listener depending on how far away from the listener the speaker is positioned. The closer the listener is to the speaker, the closer the sound will appear. The sound can be made to appear closer by increasing the volume, but this effect is limited.
  • speakers may be placed to 'surround' the listener, but it is apparent that the sound is produced at discrete points along the perimeter corresponding to the position of the speakers. This is apparent when listening to content in a surround- sound environment. In such environments, the sound can appear to move from one speaker to another, but it always sounds like its source is the speaker itself - which it is. Phasing can have the effect of blending sound between speakers, but conventional surround sound systems cannot achieve placement or apparent placement of sound in the environment at determined distances from a listener or listening location.
  • a parametric audio encoder in an audio system is configured to process a sound channel into left input and right input channel signals; apply HRTF filters to the left and right input channel signals to generate adjusted left and adjusted right channel signals; apply acoustic crosstalk cancellation filters to the adjusted left and adjusted right channel signals; and modulate the left and right output channel signal frequencies onto an ultrasonic carrier to generate modulated left output and right output channel signals for playback by a left ultrasonic emitter and a right ultrasonic emitter.
  • HRTF filters for the left and right ears of the a listener are determined by scanning the listener with an optical imaging system to determine a profile of the listener.
  • the profile of the listener comprises the head, pinna, and torso measurements of the listener.
  • the HRTF filters are determined by comparing the scanned profile of the user with a predetermined set of HRTF profiles, each profile including a predetermined range of head, pinna, and torso measurements; and automatically selecting one of the predetermined set of HRTF profiles.
  • determining HRTF filters for the left and right ears of the listener includes playing a plurality of sound samples at a predetermined frequency; recording the sound samples at a plurality of microphones, placed in the listener's left and right ears during recording; and recording the listener's position relative to the left and right ultrasonic emitters using the optical imaging system when each sound sample is recorded.
  • applying acoustic crosstalk cancellation filters to the adjusted left and adjusted right channel signals to generate left and right output channel signals includes: phase inverting the adjusted right channel signal and the adjusted left channel signal; adding a delay to the phase inverted right channel signal and to the phase inverted left channel signal; combining the adjusted left channel signal with the delayed phase inverted adjusted right channel signal to generate the left output channel signal; and combining the adjusted right channel signal with the delayed phase inverted adjusted left channel signal to generate the right output channel signal.
  • Figure 1 illustrates the conventional Dolby® Surround Sound configuration, with components for Dolby 5.1, 6.1, or 7.1 configurations.
  • Figure 2 illustrates an example encoding and decoding process in accordance with various embodiments of the technology described herein.
  • Figure 3 is a flow diagram of the method of creating a parametric audio signal from a previously signal encoded for use in a conventional surround sound system in accordance with various embodiments of the technology described herein.
  • Figure 4 is a flow diagram of the method of encoding an audio component to produce a parametric audio signal in accordance with various embodiments of the technology described herein.
  • Figure 5A is a diagram illustrating example circuitry of a parametric encoder that may be implemented to encode a sound channel into left and right ultrasonic frequency modulated output channel signals in accordance with various embodiments of the technology described herein.
  • Figure 5B is an operational flow diagram illustrating an example method of encoding a sound channel that may be implemented with the parametric encoder circuitry of Figure 5A.
  • Figure 6A illustrates an example embodiment of the invention where ultrasonic emitters direct the parametric audio signal directly towards either the left or right sides of a particular listening position.
  • Figure 6B illustrates an example embodiment of the invention where ultrasonic emitters reflect the parametric audio signal off a wall, ceiling, and/or floor.
  • Figure 7 illustrates an example embodiment of a hybrid embodiment where the method of parametric audio production and ultrasonic emitters in accordance with embodiments of the invention is combined with a conventional surround sound configuration.
  • Figure 8 illustrates an example computing module that may be used in implementing various features of embodiments of the technology described herein.
  • Embodiments of the systems and methods described herein provide multidimensional audio or a surround sound listening experience using as few as two emitters.
  • Monaural and Stereo playback has been achieved using non-linear transduction through a parametric array.
  • Non-linear transduction such as a parametric array in air, results from the introduction of audio-modulated ultrasonic signals into an air column.
  • Self-demodulation, or down-conversion occurs along the air column resulting in the production of an audible acoustic signal. This process occurs because of the known physical principle that when two sound waves of sufficient intensity with different frequencies are radiated simultaneously in the same medium, a modulated waveform including the sum and difference of the two frequencies is produced by the non-linear (parametric) interaction of the two sound waves.
  • the two original sound waves are ultrasonic waves and the difference between them is selected to be an audio frequency, an audible sound can be generated by the parametric interaction.
  • various components of the audio signal can be processed such that the signal played through ultrasonic emitters creates a multi-dimensional sound effect.
  • a three-dimensional effect can be created using only two channels of audio, thereby allowing as few as two emitters to achieve the effect.
  • other quantities of channels and emitters are used.
  • the ultrasonic transducers, or emitters, that emit the ultrasonic signal can be configured to be highly directional. Accordingly, a pair of properly spaced emitters can be positioned such that one of the pair of emitters targets one ear of the listener or a group of listeners, and the other of the pair of emitters targets the other ear of the listener or group of listeners.
  • the targeting can but need not be exclusive. In other words, sound created from an emitter directed at one ear of the listener or group of listeners can 'bleed' over into the other ear of the listener or group of listeners.
  • adjusting the parameters of the signal, frequency components of the signal, or other signal components on the two ultrasonic channels (more channels can be used) relative to each other—such as the phase, delay, gain, reverb, echo, or other audio parameters— allows the audio reproduction of that signal or of component(s) within that signal, to appear to be positioned at a predetermined or desired location (not necessarily the speaker location) in the space about the listener(s).
  • the audio can be generated by demodulation of the ultrasonic carrier in the air between the ultrasonic emitter and the listener (sometimes referred to as the air column).
  • the actual sound is created at what is effectively an infinite number of points in the air between the emitter and the listener and beyond the listener. Therefore, in various embodiments these parameters are adjusted to emphasize an apparent sound generated at a chosen location in space. For example, the sound created (e.g., for a component of the audio signal) at a desired location can be made to appear to be emphasized over the sound created at other locations. Accordingly, with just one pair of emitters (e.g., a left and right channel), the sound can be made to appear to be generated at a point along one of the paths from the emitter to the listener at a point closer to or farther from the listener, whether in front of or behind the listener.
  • just one pair of emitters e.g., a left and right channel
  • the parameters can also be adjusted so that sound appears to come from the left or right directions at a predetermined distance from the listener. Accordingly, two channels can provide a full 360 degree placement of a source of sound around a listener, and at a chosen distance from the listener. As also described herein, different audio components or elements can be processed differently, to allow controlled placement of these audio components at their respective desired locations within the channel. Adjusting the audio on two or more channels relative to each other allows the audio reproduction of that signal or signal component to appear to be positioned in space about the listener(s). Such adjustments can be made on a component or group of components (e.g., Dolby or other like channel, audio component, etc.) or on a frequency-specific basis.
  • a component or group of components e.g., Dolby or other like channel, audio component, etc.
  • adjusting phase, gain, delay, reverb, and echo, or other audio processing of a single signal component can also allow the audio reproduction of that signal component to appear to be positioned in a predetermined location in space about the listener(s). This can include apparent placement in front of or behind the listener. Additional auditory characteristics, such as, for example, sounds captured from auditorium microphones placed in the recording environment (e.g., to capture hall or ambient effects), may be processed and included in the audio signal (e.g., blending with one or more components) to provide more realism to the three-dimensional sound.
  • the parameters can be adjusted based on frequency components.
  • various audio components are created with a relative phase, delay, gain, echo and reverb or other effects built into the audio component such that can be placed in spatial relation to the listening position upon playback.
  • computer- synthesized or computer- generated audio components can be created with or modified to have signal characteristics to allow placement of various audio components and their desired respective positions in the listening environment.
  • the Dolby (or other like) components can be modified to have signal characteristics to allow apparent placement of various audio components and their desired respective positions in the listening environment.
  • a computer-generated audio/video experience such as a videogame.
  • the user is typically immersed into a world with the gaming action occurring around the user in that world in three dimensions.
  • the gamer may be in a battlefield environment that includes aircraft flying overhead, vehicles approaching from or departing to locations around the user, other characters sneaking up on the gamer from behind or from the side, gunfire at various locations around the player, and so on.
  • the gamer is in the cockpit of the vehicle. He or she may hear engine noise from the front, exhaust noise from the rear, tires squealing from the front or rear, the sounds of other vehicles behind, to the side and front of the gamer's vehicle, and so on.
  • volume alone is not the only factor used to judge distance.
  • the character of a given sound beyond its volume changes as the source of the given sound moves farther away.
  • the effects of the environment are more pronounced, for example.
  • the user can be immersed in a three-dimensional audio experience using only two "speakers" or emitters. For example, increasing the gain of an audio component on the left channel relative to the right, and at the same time adding a phase delay on that audio component for the right channel relative to the left, will make that audio component appear to be positioned to the left of the user. Increasing the gain or phase differential (or both) will cause the audio component to appear as if it is coming from a position farther to the left of the user.
  • each footstep of that character may be encoded differently to reflect that footstep's position relative to the prior or subsequent footsteps of that character.
  • the footsteps can be made to sound like they are moving toward the gamer from a predetermined location or moving away from the gamer to a predetermined position.
  • the volume of the footstep sound components can be likewise adjusted to reflect the relative distance of the footsteps as they approach or move away from the user.
  • a sequence of audio components that make up an event can be created with the appropriate phase, gain, or other difference to reflect relative movement.
  • the audio characteristics of a given audio component can be altered to reflect the changing position of the audio component.
  • the engine sound of the overtaking vehicle can be modified as the vehicle overtakes the gamer to position sound properly in the 3-D environment of the game. This can be in addition to any other alteration of the sound such as, for example, to add Doppler effects for additional realism.
  • stereo separation can be used to simulate the perception of distance by mixing an audio component between two audio channels so that the audio component is heard by both ears of the listener.
  • components can be decoded to its constituent parts, the constituent parts can be re-encoded according to the systems and methods described herein to provide correct spatial placement of the audio components and recombined into a two-channel audio signal for playback using two ultrasonic emitters.
  • FIG. 2 is a diagram illustrating an example of a system for generating two-channel, multidimensional audio from a surround- sound encoded signal in accordance with one embodiment of the systems and methods described herein.
  • the example audio system includes an audio encoding system 111 and an example audio playback system 113.
  • the example audio encoding system 111 includes a plurality of microphones 112, an audio encoder 132 and a storage medium 124.
  • the plurality of microphones 112 can be used to capture audio content as it is occurring.
  • a plurality of microphones can be placed about a sound environment to be recorded. For example, for a concert a number of microphones can be positioned about the stage or within the theater to capture sound as it is occurring at various locations in the environment. Audio encoder or surround sound encoder 132 processes the audio received from the different microphone input channels to create a two channel audio stream such as, for example, a left and right audio stream. This two-channel audio stream encoded with information for each of the tracks or microphone input channels can be stored on any of a number of different storage media
  • 124 such as, for example, flash or other memory, magnetic or optical discs, or other suitable storage media.
  • signal encoding from each microphone is performed on a track-by-track basis. That is, the location or position information of each microphone is preserved during the encoding process such that during subsequent decoding and re-encoding (described below) that location or position information affects the apparent position of the audio playback signal components.
  • encoding performed by audio encoder 132 separates the audio information into tracks that are not necessarily tied to, or that do not necessarily correspond on a one-to-one basis with each of the individual microphones 112.
  • audio components can be separated into various channels such as center front, left front, right front, left surround, right surround, left back surround, right back surround, and so on based on content rather than based on which
  • audio encoder was used to record the audio.
  • An example of audio encoder is used to create multiple tracks of audio information encoded onto a two track audio stream is a Dolby Digital or Dolby surround sound processor.
  • the audio recording generated by audio encoder 132 can store one storage medium 124 can be, for example, a Dolby 5.1 or 7.1 audio recording.
  • the content can be synthesized and assembled using purely synthesized sound or a combination of synthesized and recorded sounds.
  • a decoder 134 and parametric encoder 136 are provided in the reproduction system 113.
  • the encoded audio content in this case stored on media 124) 62-channel encoded audio content created by audio encoding system 111.
  • Decoder 134 is used to decode the encoded two-channel audio stream into the multiple different surround sound channels 141 that make up the audio content. For example, in an embodiment where multiple microphones 112 are used to record multiple channels of audio content, coder 134 can re-create an audio channel 141 for each microphone channel 112.
  • coder 134 can be implemented as a Dolby decoder and the surround sound channels 141 are the re-created surround sound speaker channels (e.g., left front, center, right front, and so on).
  • Parametric encoder 136 and be implemented as described above to split each surround sound channel 141 into a left and right channel, and to apply audio processing (in the digital or analog domain) to position the sound for each channel at the appropriate position in the listening environment. As described above, such positioning can be accomplished by adjusting the phase, delay, gain, echo, reverb and other parameters of the left channel relative to the right channel or of both channels simultaneously for a given surround sound effect.
  • This parametric encoding for each channel can be performed on each of the surround sound channels 141, and the left and right components of each of the surround sound channels 141 combined into a composite left and right channel for reproduction by ultrasonic emitters 144. With such processing, the surround sound experience can be produced in a listening environment using only two emitters (i.e., speakers), rather than requiring 5-7 (or more) speakers placed about the listening environment.
  • FIG. 3 is a diagram illustrating an example process for generating multi-dimensional audio content in accordance with one embodiment of the systems and methods described herein.
  • surround sound encoded audio content is received, in the form of an audio bitstream.
  • a two-channel Dolby encoded audio stream can be received from a program source such as, for example, a DVD, Blu-Ray Disk, or other program source.
  • the surround- sound encoded audio stream is decoded, and the separate channels are available for processing. In various embodiments, this can be done using conventional Dolby decoding that separates an encoded audio stream into the various individual surround channels.
  • the resulting audio streams for each channel can include digital or analog audio content.
  • the desired location of these channels is identified or determined. In other words, for example, in terms of a Dolby 7.1 audio content, the desired position for the audio for each of the left front, center front, right front, left surround, right surround, back left surround and back right surround channels is determined.
  • a digitally encoded Dolby bitstream can be received, for example, from a program source such as DVD, BlueRay, other audio program source.
  • the channels are processed to "place" each audio channel at the desired location in the listening field.
  • each channel is divided into two channels (for example, a left and a right channel) the appropriate processing applied provide spatial context for the channel.
  • this can involve adding a differential phase shift, gain, echo, reverb, and other audio parameter to each channel relative to the other for each of the surround channels to effectively place the audio content for that channel at the desired location in the listening field.
  • no phase or gain differentials are applied to the left and right channels so that the audio appears to be coming from between the two emitters.
  • the audio content is modulated to ultrasonic frequencies and played through the pair of parametric emitters.
  • parametric processing is performed with the assumption that the pair of parametric emitters will be placed like conventional stereo speakers - i.e, in front of the listener and separated by distance to the left and right of the center line from the listener.
  • processing can be performed to account for placement of the parametric emitters at various other predetermined locations in the listening environment. By adjusting parameters such as the phase and gain of the signal being sent to one emitter relative to the signal being sent to the other emitter, placement of the audio content can be achieved at desired locations given the actual emitter placement.
  • FIG. 4 is a diagram illustrating an example process for generating and reproducing multidimensional audio content using parametric emitters in accordance with one embodiment of the systems and methods described herein.
  • An example application for the process shown in the embodiment of FIG. 4 is an application in the video game environment.
  • various audio objects are created with their positional or location information already built in or embedded such that when played through is a pair of parametric emitters, the sound of each audio object appears to be originating from the predetermined desired location.
  • an audio object is created.
  • an audio object can be any of a number of audio sounds or sound clips such as, for example, a footstep, a gunshot, a vehicle engine, or a voice or sound of another character, just to name a few.
  • the developer determines the location of the audio object source relative to the listener position. For example at any given point in a war game, the game may generate the sound of gunfire (or other action) emanating from a particular location. For example, consider the case of gunfire originating from behind and to the left of the gamer's current position.
  • the audio object (gunfire in this example) is encoded with the location information such that when it is played to the gamer using the parametric emitters, the sound appears to emanate from behind and to the left of the gamer. Accordingly, when the audio object is created, it can be created as an audio object having two channels (e.g., left and right channels) with the appropriate phase and gain differentials, and other audio characteristics, to cause the sound to appear to be emanating from the desired locations.
  • the sound can be prestored as library objects with the location information or characteristics already embedded or encoded therein such that they can be called from the library and used as is.
  • generic library objects are stored for use, and when called for application in a particular scenario are processed to apply the position information to the generic object.
  • gunfire sounds from a particular weapon can be stored in a library and, when called, processed to add the location information to the sound based on where the gunfire is to occur relative to the gamer's position.
  • the audio components with the location information are combined to create the composite audio content, and at step 333 the composite audio content is played to the user using the pair of parametric emitters.
  • Figure 5 A is a diagram illustrating an example processing module of a parametric encoder that may be implemented to encode a sound channel 410 baseband audio signal into left and right ultrasonic frequency modulated output channel signals for processing and transmission by ultrasonic processors/emitters 450A and 450B.
  • the system may receive left and right channels for processing such as, for example, in a stereo sound environment.
  • a sound channel 410 can be divided into two component channels (left and right) for processing.
  • circuitry 400 comprises channel processors 420A and 420B for processing the left and right channels relative to each other to effectively place the audio content of the sound channels at the desired location in the listening field.
  • channel processors 420A and 420B comprise head-related transfer function (HRTF) filters for encoding the sound channel in three dimensional space based on the expected response of a listener who is listening to the sound emitted from a plurality of ultrasonic emitters.
  • Circuitry 400 may also include combiners 430A, 430B and ultrasonic modulators 440 A, 440B.
  • the combiners may be included to cancel some or all of the acoustic crosstalk that may occur between ultrasonic emitters 450A and 450B.
  • the ultrasonic modulators modulate each output left and output right channel audio signal 405A and 405B onto an ultrasonic carrier at ultrasonic frequencies.
  • a head-related transfer function Prior to encoding, a head-related transfer function (HRTF) is calibrated for the left and right ears of the listener of the audio content to more accurately synthesize the 3D sound source. Because different individual listeners have different geometries (e.g. torso, head, and pinnae) with different sound reflection and diffraction properties, their ears will respond differently to sound received from the same point in space.
  • the calibrated HRTF estimates the response of a listener's ears relative to a sound source's (e.g. ultrasonic emitter) frequency, position in space, and delay.
  • the HRTF is a function of the sound source's frequency, delay, distance from the listener, azimuth angle relative to the listener, and elevation angle relative to the listener.
  • the HRTF in some embodiments can be implemented to specify a plurality of finite impulse response (FIR) filter pairs, one for the left ear and one for the right ear, each filter pair placing a sound source at a specific position
  • the HRTF is calibrated for the listener by selecting a HRTF profile from a predetermined set of HRTF profiles stored on a computer readable medium.
  • each predetermined HRTF profile may be based on a model listener's geometry, for example, the model listener's head, pinnae, and torso measurements.
  • the listener's geometry may be compared against the geometry of each of the HRTF profiles.
  • a HRTF profile may be automatically selected from a HRTF profile whose model listener's geometry most closely resembles the listener' s own geometry.
  • the HRTF profile may be manually selected from the predetermined set of HRTF profiles.
  • the listener may store a custom HRTF profile on the computer readable medium.
  • an optical imaging system is used to determine the geometry (e.g. head, pinnae, and torso) of the listener for comparison against the predetermined set of HRTF profiles.
  • the optical imaging system may include an optical profilometer with a digital camera and scanning light source. The scanning light source scans the listener's head, pinnae, and torso at a predetermined frequency for a predetermined amount of time, thereby generating approximate measurements of the listener's geometry (e.g. head, pinnae, and torso).
  • the optical imaging system may be based on other known dynamic 3D body scanning technologies.
  • the optical imaging system may include a depth sensor such as a stereoscopic vision-based or structured light-based sensor.
  • the depth sensor measures the listener's position relative to the ultrasonic emitters.
  • the selected HRTF profile may be further refined by using the ultrasonic emitters to play a plurality of sound samples.
  • the optical imaging system may record the listener's position relative to the left and right ultrasonic emitters.
  • the listener is asked to select the perceived location (relative to the listener) of the sound sample. Based on the listener's selections, and the listener's corresponding recorded positions, the parameters of the selected HRTF profile may be refined.
  • the listener wears headphones connected to the parametric audio system.
  • the left and right earpieces of the headphones each include one or more microphones.
  • the ultrasonic emitters play a plurality of sound samples at a plurality of different virtual locations.
  • the headphones record the sound samples at the listener's ears.
  • the recorded sound samples are compared with the original sounds. Based on this comparison and the listener's recorded positions, the HRTF may be calibrated.
  • the parametric audio system may save the listener's HRTF profile for subsequent uses. For example, when the listener
  • the parametric audio system may comprise a biometric sensor, an imaging sensor (e.g. a camera of an optical imaging system), or other sensing apparatus, that automatically detects the identity of the listener and loads the saved HRTF for that listener.
  • Parametric encoder circuitry 400 will now be described with respect to Figure 5B, which is an operational flow diagram illustrating an example method of encoding a sound channel 410.
  • the example encoding process may be applied to a plurality of surround sound channels 410 that make up an original audio content. For example, where multiple microphones were used to record multiple channels of audio content, coder 400 re-creates a sound channel 410 for each microphone channel and encodes it into a left and right channel for producing a three
  • parametric encoder 400 is included in this example to divide the sound channel into left and right input channel signal components 401 A and 40 IB.
  • the right and left channel signals are encoded with location information that specifies a desired location (azimuthal, elevation, and distance) in the listening field environment of the listener. This can be done, for example, using the techniques described above with reference to Figure 2.
  • left and right input channel signals 401 A and 40 IB are subsequently filtered using processing functions (e.g. HRTF filters) to generate a three dimensional sound effect when the signals are output by directional ultrasonic emitters 440A and 440B.
  • processing functions e.g. HRTF filters
  • the audio content is provided as a stereo or other 2-channel signal and there is no need to split a sound channel into left and right channels. Accordingly, block 410 may be omitted in various embodiments. Although one advantage that may be obtained in various two-channel embodiments is the ability to achieve a multi-dimensional sound effect with only two audio channels, other embodiments can be implemented for audio content one greater than two channels.
  • channel processors 420A and 420B apply the calibrated HRTF filters to channel signals 401 A and 40 IB, respectively, based on the desired 3D sound location
  • left channel processor 420A and right channel processor 420B may apply additional filters to the left and right channel signals to further enhance the 3D sound effect.
  • the system can be configured to adjust parameters such as the phase, delay, gain, reverb, echo, or other audio parameters, as described above, to enhance the 3D sound effect.
  • additional filters may be applied based on characteristics of the listening environment such as the listening environment' s physical configuration, the listening
  • acoustic crosstalk cancellation filters are applied to the adjusted left and right channel signals to generate left and right output channel signals.
  • Figure 5 A illustrates one specific example implementation of these filters for audio modulated ultrasonic signal.
  • the phase, frequency, and amplitude of the output beams can be assumed approximately constant.
  • the audio signal for one of the two channels is inverted and the delay adjusted for one of the channels relative to the other.
  • left combiner 430A performs a phase inversion of signal 403B, delays the inverted signal 403B, and combines it with signal 403A, resulting in output left channel signal 405A.
  • the delay accounts for the difference in time between the arrival of the canceling and interfering signals at the listener's left ear. In one embodiment, this delay may be determined based on the optical imaging system recording the listener's interaural distance (i.e. ear separation) and the listener's position relative to the left and right ultrasonic emitters.
  • phase inversion and delay can be performed by processing blocks other than the combiner. For example, phase delay and inversion can be performed by left or right channel processors 420 A, 420B. When the ultrasonic beams collide, the left channel audio is cancelled out via destructive interference and does not become audible when the beams intersect.
  • right channel audio may be cancelled out if right channel combiner 430B phase inverts signal 403A, delays the inverted signal 403 A, and combines it with signal 403B, thereby generating output signal 405B.
  • the reflection and filtering properties of the listening environment may also be considered as filter parameters for combiners 430A and 430B.
  • the left and right output channel audio signal frequencies are modulated or upconverted to ultrasonic frequencies using left ultrasonic modulator 440A and right ultrasonic modulator 440B.
  • the ultrasonic-frequency modulated output signals may subsequently be played by ultrasonic emitters.
  • the modulated left output channel signal is received by left ultrasonic processor/emitter 450A and the modulated right output channel signal is received by right ultrasonic processor/emitter 450B.
  • the ultrasonic processors respectively convert the received signals to ultrasonic beams for output by the emitters, thereby generating a realistic and substantially noise-free 3D sound effect in the listening field environment of the listener.
  • processors/emitters 450A, 450B can comprise an amplifier and an ultrasonic emitter such as, for example, a conventional piezo or electrostatic emitter. Examples of filtering, modulation and amplification, as well as example emitter configurations are described in United States Patent No. 8,718,297, titled Parametric Transducer and Related Methods, which is incorporated herein by reference in its entirety.
  • the disclosed use of 1) HRTF filters for 3D sound production and 2) acoustic crosstalk cancellation filters is made effective by the disclosed ultrasonic emitters, which emit focused sound beams (e.g., audio modulated ultrasonic signals) with approximately constant amplitude, phase, and frequency components as the beams propagate through and demodulate in the listening environment.
  • FIGs. 6 A and 6B are diagrams illustrating example implementations of the
  • two parametric emitters are illustrated as being included in the system, left front and right front ultrasonic emitters, LF and RF, respectively.
  • other quantities of emitters or channels can be used.
  • the left and right emitters are placed such that the sound is directed toward the left and right ears, respectively, of the listener or listeners of the video game or other program content.
  • Alternative emitter positions can be used, but positions that direct the sound from each ultrasonic emitter LF, RF, to the respective ear of the listener(s) allow spatial imagery as described herein.
  • the ultrasonic emitters LF, RF are placed such that the ultrasonic frequency emissions are directed at the walls (or other reflective structure including the ceiling or floor) of the listening environment.
  • a virtual speaker or sound source is created.
  • the resultant audio waves are directed toward the ears of the listener(s) at the determined seating position.
  • the ultrasonic emitters can be combined with conventional speakers in stereo, surround sound or other configurations.
  • FIG. 7 is a diagram illustrating an example implementation of the multidimensional audio system in accordance with another embodiment of the systems and methods described herein.
  • the ultrasonic emitter configuration of FIG. 5B is combined with a conventional 7.1 surround sound system.
  • the configuration of FIG. 5A can also be combined with a conventional 7.1 surround sound system.
  • an additional pair of ultrasonic emitters can be placed to reflect a ultrasonic carrier audio signal from the back wall of the environment, replacing the conventional rear speakers.
  • the emitters can be aimed to be targeted to a given individual listener's ears in a specific listening position in the room. This can be useful to enhance the effects of the system. Also, consider an application where one individual listener of a group of listeners is hard of hearing. Implementing hybrid embodiments (such as the example of FIG. 6) can allow the emitters to be targeted to the hearing impaired listener. As such, the volume of the audio from the ultrasonic emitters can be adjusted to that listener's elevated needs without needing to alter the volume of the conventional audio system. Where a highly directional audio beam is used from the ultrasonic emitters and targeted at the hearing impaired listener's ears, the increased volume from the ultrasonic emitters is not heard (or is only detected at low levels) by listeners who are not in the targeted listening position.
  • the ultrasonic emitters can be combined with conventional surround sound configurations to replace some of the conventional speakers normally used.
  • the ultrasonic emitters in FIG. 6 can be used as the LS, RS speaker pair in a Dolby 5.1, 6.1, or 7.1 surround sound system, while conventional speakers are used for the remaining channels.
  • the ultrasonic emitters may also be used as the back speakers BSC, BSL, BSR in a Dolby 6.1 or 7.1 configuration.
  • computing module 500 may represent, for example, computing or processing capabilities found within desktop, laptop and notebook computers; hand-held computing devices (PDA's, smart phones, cell phones, palmtops, etc.); mainframes,
  • Computing module 500 might also represent computing capabilities embedded within or otherwise available to a given device.
  • a computing module might be found in other electronic devices such as, for example, digital cameras, navigation systems, cellular telephones, portable computing devices, modems, routers, WAPs, terminals and other electronic devices that might include some form of processing capability.
  • Computing module 500 might include, for example, one or more processors, controllers, control modules, or other processing devices, such as a processor 504.
  • Processor 504 might be implemented using a general-purpose or special-purpose processing engine such as, for example, a microprocessor, controller, or other control logic.
  • processor 504 is connected to a bus 502, although any communication medium can be used to facilitate interaction with other components of computing module 500 or to communicate externally.
  • Computing module 500 might also include one or more memory modules, simply referred to herein as main memory 508. For example, preferably random access memory (RAM) or other dynamic memory, might be used for storing information and instructions to be executed by processor 504. Main memory 508 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Computing module 500 might likewise include a read only memory (“ROM”) or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. The computing module 500 might also include one or more various forms of information storage mechanism 510, which might include, for example, a media drive 512 and a storage unit interface 520. The media drive 512 might include a drive or other mechanism to support fixed or removable storage media 514.
  • main memory 508 preferably random access memory (RAM) or other dynamic memory
  • Main memory 508 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504.
  • Computing module 500 might likewise include a read only memory (“ROM”) or other static storage device coupled
  • storage media 514 might include, for example, a hard disk, a floppy disk, magnetic tape, cartridge, optical disk, a CD or DVD, or other fixed or removable medium that is read by, written to or accessed by media drive 512.
  • the storage media 514 can include a computer usable storage medium having stored therein computer software or data.
  • information storage mechanism 510 might include other similar instrumentalities for allowing computer programs or other instructions or data to be loaded into computing module 500.
  • Such instrumentalities might include, for example, a fixed or removable storage unit 522 and an interface 520.
  • Examples of such storage units 522 and interfaces 520 can include a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory module) and memory slot, a PCMCIA slot and card, and other fixed or removable storage units 522 and interfaces 520 that allow software and data to be transferred from the storage unit 522 to computing module 500.
  • Computing module 500 might also include a communications interface 524.
  • Communications interface 524 might be used to allow software and data to be transferred between computing module 500 and external devices.
  • Examples of communications interface 524 might include a modem or softmodem, a network interface (such as an Ethernet, network interface card, WiMedia, IEEE 802.XX or other interface), a communications port (such as for example, a USB port, IR port, RS232 port Bluetooth® interface, or other port), or other communications interface.
  • Software and data transferred via communications interface 524 might typically be carried on signals, which can be electronic, electromagnetic (which includes optical) or other signals capable of being exchanged by a given communications interface 524. These signals might be provided to communications interface 524 via a channel 528. This channel 528 might carry signals and might be implemented using a wired or wireless
  • a channel might include a phone line, a cellular link, an RF link, an optical link, a network interface, a local or wide area network, and other wired or wireless communications channels.
  • computer program medium and “computer usable medium” are used to generally refer to media such as, for example, memory 508, and storage devices such as storage unit 520, and media 514. These and other various forms of computer program media or computer usable media may be involved in carrying one or more sequences of one or more instructions to a processing device for execution. Such instructions embodied on the medium, are generally referred to as “computer program code” or a “computer program product” (which may be grouped in the form of computer programs or other groupings). When executed, such instructions might enable the computing module 500 to perform features or functions of the present invention as discussed herein.
  • the term “including” should be read as meaning “including, without limitation” or the like; the term “example” is used to provide exemplary instances of the item in discussion, not an exhaustive or limiting list thereof; the terms “a” or “an” should be read as meaning “at least one,” “one or more” or the like; and adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known” and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. Likewise, where this document refers to technologies that would be apparent or known to one of ordinary skill in the art, such
  • module does not imply that the components or functionality described or claimed as part of the module are all configured in a common package. Indeed, any or all of the various components of a module, whether control logic or other components, can be combined in a single package or separately maintained and can further be distributed in multiple groupings or packages or across multiple locations.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

L'invention concerne des systèmes et des procédés utilisant des émetteurs à ultrasons pour produire un audio paramétrique multidimensionnel. Les systèmes et procédés peuvent être configurés pour déterminer des filtres HRTF pour les oreilles gauche et droite d'un auditeur, au moyen d'un système d'imagerie optique, pour balayer un profil d'un auditeur. Un contenu audio peut être encodé dans des voies gauche et droite de sorte à produire un effet sonore tridimensionnel pour l'auditeur du contenu audio en : traitant la voie sonore dans des signaux de voies d'entrée gauche et droite ; appliquant les filtres HRTF et des filtres d'annulation de diaphonie acoustique sur les signaux de voies droite et gauche de sorte à générer des signaux de voies de sortie gauche et droite ; et modulant les fréquences des signaux des voies de sortie gauche et droite sur une porteuse à ultrasons.
PCT/US2014/050759 2013-08-12 2014-08-12 Système et procédé de production d'audio paramétrique multidimensionnel WO2015023685A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201361864757P 2013-08-12 2013-08-12
US61/864,757 2013-08-12
US13/969,292 US20140050325A1 (en) 2012-08-16 2013-08-16 Multi-dimensional parametric audio system and method
US13/969,292 2013-08-16
US14/457,588 US9271102B2 (en) 2012-08-16 2014-08-12 Multi-dimensional parametric audio system and method
US14/457,588 2014-08-12

Publications (1)

Publication Number Publication Date
WO2015023685A1 true WO2015023685A1 (fr) 2015-02-19

Family

ID=52468636

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/050759 WO2015023685A1 (fr) 2013-08-12 2014-08-12 Système et procédé de production d'audio paramétrique multidimensionnel

Country Status (1)

Country Link
WO (1) WO2015023685A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018084769A1 (fr) * 2016-11-04 2018-05-11 Dirac Research Ab Construction d'une base de données de filtres audio à l'aide de données de suivi de tête
CN112492380A (zh) * 2020-11-18 2021-03-12 腾讯科技(深圳)有限公司 音效调整方法、装置、设备及存储介质
CN112954581A (zh) * 2021-02-04 2021-06-11 广州橙行智动汽车科技有限公司 一种音频播放方法、系统及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030147543A1 (en) * 2002-02-04 2003-08-07 Yamaha Corporation Audio amplifier unit
US20120314872A1 (en) * 2010-01-19 2012-12-13 Ee Leng Tan System and method for processing an input signal to produce 3d audio effects
US20130194107A1 (en) * 2012-01-27 2013-08-01 Denso Corporation Sound field control apparatus and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030147543A1 (en) * 2002-02-04 2003-08-07 Yamaha Corporation Audio amplifier unit
US20120314872A1 (en) * 2010-01-19 2012-12-13 Ee Leng Tan System and method for processing an input signal to produce 3d audio effects
US20130194107A1 (en) * 2012-01-27 2013-08-01 Denso Corporation Sound field control apparatus and program

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018084769A1 (fr) * 2016-11-04 2018-05-11 Dirac Research Ab Construction d'une base de données de filtres audio à l'aide de données de suivi de tête
WO2018084770A1 (fr) * 2016-11-04 2018-05-11 Dirac Research Ab Procédés et systèmes pour déterminer et/ou utiliser un filtre audio sur la base de données de suivi de tête
CN109997376A (zh) * 2016-11-04 2019-07-09 迪拉克研究公司 使用头部跟踪数据构建音频滤波器数据库
US10715945B2 (en) 2016-11-04 2020-07-14 Dirac Research Ab Methods and systems for determining and/or using an audio filter based on head-tracking data
CN112492380A (zh) * 2020-11-18 2021-03-12 腾讯科技(深圳)有限公司 音效调整方法、装置、设备及存储介质
CN112492380B (zh) * 2020-11-18 2023-06-30 腾讯科技(深圳)有限公司 音效调整方法、装置、设备及存储介质
CN112954581A (zh) * 2021-02-04 2021-06-11 广州橙行智动汽车科技有限公司 一种音频播放方法、系统及装置
WO2022166708A1 (fr) * 2021-02-04 2022-08-11 广州橙行智动汽车科技有限公司 Procédé, système et appareil de lecture audio, véhicule et support de stockage

Similar Documents

Publication Publication Date Title
US9271102B2 (en) Multi-dimensional parametric audio system and method
US20140050325A1 (en) Multi-dimensional parametric audio system and method
KR100608025B1 (ko) 2채널 헤드폰용 입체 음향 생성 방법 및 장치
KR102423757B1 (ko) 음향 신호의 렌더링 방법, 장치 및 컴퓨터 판독 가능한 기록 매체
US10021507B2 (en) Arrangement and method for reproducing audio data of an acoustic scene
US9154896B2 (en) Audio spatialization and environment simulation
US9769589B2 (en) Method of improving externalization of virtual surround sound
US8358091B2 (en) Apparatus and method for generating a number of loudspeaker signals for a loudspeaker array which defines a reproduction space
CA3101903C (fr) Procede et appareil de rendu de signal acoustique, et support d'enregistrement lisible par ordinateur
KR100636252B1 (ko) 공간 스테레오 사운드 생성 방법 및 장치
US11516616B2 (en) System for and method of generating an audio image
KR100677629B1 (ko) 다채널 음향 신호에 대한 2채널 입체 음향 생성 방법 및장치
WO2012042905A1 (fr) Dispositif et procédé de restitution sonore
US8867749B2 (en) Acoustic spatial projector
CN105308988A (zh) 配置成转换音频输入通道用于头戴受话器收听的音频解码器
CA2984121C (fr) Systeme acoustique
US9467792B2 (en) Method for processing of sound signals
KR102357293B1 (ko) 입체 음향 재생 방법 및 장치
KR20190109019A (ko) 가상 공간에서 사용자의 이동에 따른 오디오 신호 재생 방법 및 장치
US10440495B2 (en) Virtual localization of sound
WO2015023685A1 (fr) Système et procédé de production d'audio paramétrique multidimensionnel
JP2018514134A (ja) 前面ラウドスピーカによって個別の三次元音響を達成する、車内再生のためのステレオ信号を処理する装置および方法
US20060109986A1 (en) Apparatus and method to generate virtual 3D sound using asymmetry and recording medium storing program to perform the method
US20230011591A1 (en) System and method for virtual sound effect with invisible loudspeaker(s)
US20240233746A9 (en) Audio rendering method and electronic device performing the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14755953

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14755953

Country of ref document: EP

Kind code of ref document: A1