WO2017165968A1 - A system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources - Google Patents

A system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources Download PDF

Info

Publication number
WO2017165968A1
WO2017165968A1 PCT/CA2017/050384 CA2017050384W WO2017165968A1 WO 2017165968 A1 WO2017165968 A1 WO 2017165968A1 CA 2017050384 W CA2017050384 W CA 2017050384W WO 2017165968 A1 WO2017165968 A1 WO 2017165968A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
channel
virtual
channels
signal
Prior art date
Application number
PCT/CA2017/050384
Other languages
French (fr)
Inventor
Michael Godfrey
Original Assignee
Rising Sun Productions Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rising Sun Productions Limited filed Critical Rising Sun Productions Limited
Publication of WO2017165968A1 publication Critical patent/WO2017165968A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Definitions

  • the present invention relates to creating three-dimensional binaural audio. More particularly, the present invention relates to a system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources which may be output to a listener over two or more channels, broadcast, shared or recorded for future playback.
  • a method for creating three-dimensional binaural audio from audio signals on a left stereo sound channel and a right stereo sound channel comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the left stereo sound channel or the right stereo sound channel of incoming audio or by combining replicated audio signals of the left stereo sound channel and the right stereo sound channel of incoming audio; assigning each of the left stereo sound channel, right stereo sound channel, and the virtual channels to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three- dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the left stereo sound channel, the right stereo sound channel
  • a method for creating three-dimensional binaural audio from audio signals on three or more audio channels comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the three or more audio channels of incoming audio or by combining the audio signals on two or more of the three or more audio channels of incoming audio; assigning each of the three or more audio channels and the virtual channels to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three- dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the three or more audio channels and the virtual channels through one or more head-related transfer function (HRTF) processors to result
  • HRTF head-related transfer function
  • the present invention provides a system for creating three-dimensional binaural audio from audio signals on two or more audio channels, the system comprising: a signal multiplier for creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the two or more audio channels of incoming audio or by combining the replicated audio signals on two or more of the two or more audio channels of incoming audio using a signal combiner; wherein each of the two or more audio channels and the virtual channels is assigned to a position in a three- dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; one or more audio processors for processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; one or more head-related transfer function (HRTF) audio processors for processing
  • HRTF head-related transfer function
  • the present invention provides a method for creating three-dimensional binaural audio from an audio signal on a mono audio channel, the method comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the mono audio channel of incoming audio; assigning each of the mono audio channel and the virtual channels to a position in a three- dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the mono audio channel and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio; and outputting the two or more audio output signals for recording, distribution, or
  • HRTF head-related transfer function
  • Figure 1 is a system diagram for a system for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention
  • Figure 2 is a system diagram for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention
  • Figure 3 is a system diagram for creating three-dimensional binaural audio from audio signals on two or more audio channels according to an embodiment of the present invention
  • Figure 4 is a system diagram for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention
  • Figure 5 is a system diagram for creating three-dimensional binaural audio from audio signals on two or more audio channels according to an embodiment of the present invention
  • Figure 6 is a flow diagram of a method for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention
  • Figure 7 is a flow diagram of a method for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention.
  • Figure 8 is a flow diagram of a method for creating three-dimensional binaural audio from an audio signal on a mono audio channel according to an embodiment of the present invention.
  • a left channel 104 and a right channel 102 from a stereo sound source 103 may be output into a left signal 108 and a right signal 106.
  • the left signal 108 and the right signal 106 may go through an audio distribution bussing matrix 110.
  • the audio distribution bussing matrix 110 may output a left side signal 112 and a right side signal 114.
  • the left side signal 112 and the right side signal 114 may be processed using a processor 122.
  • the left side signal 112 and the right side signal 114 may be combined using a signal combiner 124 to create a center rear signal 128.
  • the center rear signal 128 may be processed using a processor 122.
  • the audio distribution bussing matrix 110 may also output a left signal 116 and a right signal 118.
  • the left signal 116 and the right signal 118 may be transmitted to a top channel 120.
  • the left signal 116 and the right signal 118 may be combined using a signal combiner 124 to create a Low Frequency Effects (LFE) channel 126.
  • the LFE channel 126 may be processed with a processor 122 and panned center 182.
  • the left signal 116 and the right signal 118 may be combined using a signal combiner 124 to create a center signal 130.
  • the center signal 130 may be processed using a processor 122.
  • the processed left side signal 112 may be panned rear 170
  • the processed right side signal 114 may be panned rear 172
  • the center rear signal 128 may be panned rear 174.
  • the processed left side signal 112, right side signal 114, center rear signal 128 may also be processed using an ambience processor 132.
  • the left signal 116 may be panned front 176, the right signal may be panned front 178 and the center signal 130 may be panned front 180.
  • the left signal 116, right signal 118 and center signal 130 may be processed using an ambience processor 132.
  • Each channel 112, 114, 116, 118, 126, 128, 130 has a bypass 160 activated by a switch 162 so that the audio signal on that channel may optionally pass through the ambience processor 132 without processing.
  • Each channel 112, 114, 116, 118, 126, 128, 130 may also be individually adjusted for gain 164.
  • each channel 112, 114, 116, 118, 126, 128, 130 may be processed by an FIRTF processor 138 to create a two-channel output via a left channel output 142 and a right channel output 140.
  • the right channel output 140 and the left channel output 142 may be output via headphones with sonic configuration of a right side 152 and a left side 150 and a virtual left side 148, virtual right side 144, virtual rear 146, virtual center 154 and virtual subwoofer 156.
  • a left channel 204 and a right channel 202 from a stereo sound source 203 may be output into a left signal 208 and a right signal 206.
  • the left signal 208 and the right signal 206 may be divided and multiplied into a plurality of additional virtual audio channels by a signal multiplier 280.
  • the left signal 208 and right signal 206 may be divided and multiplied into a left front channel 214 and a right front channel 216.
  • the left front channel 214 may be output 260 and the level may be controlled 274.
  • the right front channel 216 may be output 254 and the level may be controlled 268.
  • the left signal 208 and the right signal 206 may be combined using a signal combiner 224 and processed using an equalization (EQ) processor 226 to create a center front channel 234.
  • the center front channel 234 may be output 262 and the level may be controlled 276.
  • the left signal 208 and the right signal 206 may be combined using a signal combiner 224 and processed using an EQ processor 226 to create an LFE channel 228.
  • the LFE channel 228 may be output 250 and the level may be controlled 264.
  • the left signal 208 and the right signal 206 may be combined using a signal combiner 224 and processed using an EQ processor 226 to create a center rear channel 248.
  • the center rear channel 248 may be output 256 and the level may be controlled 270.
  • the left signal 208 may be processed using an EQ processor 226 and a dynamic equalizer 230 or stereo image enhancer 232 to create a left rear channel 236.
  • the left rear channel 236 may be output 258 and the level may be controlled 272.
  • the right signal 206 may be processed using an EQ processor 226 and a dynamic equalizer 230 or stereo image enhancer 232 to create a right rear channel 238.
  • the right rear channel 238 may be output 252 and the level may be controlled 266.
  • a multichannel sound source 301 may output a left channel 304, a right channel 302, a center channel 308, a LFE channel 310, a left rear channel 312 and a right rear channel 314, wherein the channels 304, 302, 308, 310, 312, 314 may be distributed by audio signal distribution matrix 390.
  • the left rear channel 312 may be processed using an EQ processor 326 and a spatial processor 328, panned rear 362 and output 380.
  • the right rear channel 314 may be processed using an EQ processor 326 and a spatial processor 328, panned rear 356 and output 382.
  • the left channel 304 may be panned front 350, adjusted for gain 374 and output 380.
  • the right channel 302 may be panned front 354, adjusted for gain 370 and output 382.
  • the processed left rear channel 312 and right rear channel 314 may be combined using a signal combiner 324, processed, and adjusted for gain 372 to create a center rear channel 348.
  • the center rear channel 348 may be panned rear 360 and output 382.
  • the center channel 308 may be panned front 352 and output 380.
  • the LFE channel 310 may be processed using an EQ processor 326, panned center 358 and output 382.
  • a right channel 402 and a left channel 404 may go through an audio distribution bussing matrix 410 to create a plurality of additional audio channels.
  • the additional channels may be selected for processing by an individual or group of spatial processors 403 to enhance individual directional spatial properties prior to be being processed by a variable ambience processor 405.
  • the variable ambience processor 432 may output a rear center channel 412, a center channel 414, an LFE channel 416, a right side channel 418, a left side channel 420, a right channel 422 and a left channel 424, which may be processed by an HRTF processor 438.
  • the HRTF processor 438 may convert incoming audio into multidimensional audio whose immersive output may be monitored via headphones with a right side 452 and a left side 450 or loudspeakers with a virtual center 454, a virtual left side 448, a virtual right side 444, a virtual rear center 446, and a virtual subwoofer 456.
  • a multichannel audio input 501 such as a DVD containing 5.1 channels of surround sound may be selected for processing by an individual or group of spatial processors 503 to enhance individual directional and spatial properties prior to be being processed by a variable ambience processor 505.
  • the variable ambience processor 532 may output a rear center channel 512, a center channel 514, an LFE channel 516, a right side channel 518, a left side channel 520, a right channel 522 and a left channel 524, which may be processed by an HRTF processor 538.
  • the HRTF processor 538 may convert incoming audio into multidimensional audio whose immersive output may be monitored via headphones with a right side 552 and a left side 550 or loudspeakers with a virtual center 554, a virtual left side 548, a virtual right side 544, a virtual rear center 546, and a virtual subwoofer 556.
  • a method 600 for creating three- dimensional binaural audio from audio signals on a left stereo sound channel and a right stereo sound channel is shown.
  • a first step 605 includes creating one or more virtual channels, wherein for each virtual channel, the virtual channel is created by replicating the audio signal on the left stereo sound channel or the right stereo sound channel of incoming audio or by combining replicated audio signals of the left stereo sound channel and the right stereo sound channel of incoming audio.
  • virtual channels may be created for one or both of the right stereo sound channel and the left stereo sound channel (i.e. to be used in place of the original channel), in which case subsequent steps of the method 600 may proceed using the virtual channels corresponding to those original channels.
  • Step 610 includes assigning each of the left stereo sound channel, right stereo sound channel, and the virtual channels to a position to a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point.
  • Step 615 includes processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix.
  • Step 620 includes processing one or more of the audio signals on the left stereo sound channel, right stereo sound channel, and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio.
  • Step 625 includes outputting the two or more audio output signals for recording, distribution, or output as sound.
  • HRTF head-related transfer function
  • Step 705 includes creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the three or more audio channels of incoming audio or by combining the audio signals on two or more of the three or more audio channels of incoming audio.
  • virtual channels may be created for one or more of the original channels (i.e. to be used in place of the original channel), in which case subsequent steps of the method 700 may proceed using the virtual channels corresponding to those original channels.
  • Step 710 includes assigning each of the three or more audio channels and the virtual channels to a position to a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point.
  • Step 715 includes processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix.
  • Step 720 includes processing one or more of the audio signals on the three or more audio channels and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio.
  • Step 725 includes outputting the two or more audio output signals for recording, distribution, or output as sound.
  • HRTF head-related transfer function
  • Step 805 includes creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the mono audio channel of incoming audio.
  • a virtual channel may be created for the original channel (i.e. to be used in place of the original channel), in which case subsequent steps of the method 800 may proceed using the virtual channel corresponding to the original channel.
  • Step 810 includes assigning each of the mono audio channel and the virtual channels to a position to a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point.
  • Step 815 includes processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix.
  • Step 820 includes processing one or more of the audio signals on the mono audio channel and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio.
  • Step 825 includes outputting the two or more audio output signals for recording, distribution, or output as sound.
  • HRTF head-related transfer function
  • a method for creating three-dimensional, binaural audio from stereo, mono and multichannel sound sources by converting stereo two-channel audio into spatially immersive binaural audio may include replicating or electronically splitting two input channels of incoming audio into a plurality of additional virtual audio channels.
  • the two input channels may include a left channel and a right channel.
  • replicating as used herein is intended to encompass terms such as splitting, multiplying and copying to create either an identical or near-identical audio signal (which may vary due to factors such as signal loss, etc.) as would be appreciated by a person skilled in the field having knowledge of analog and digital audio processing.
  • the corresponding system-related term used is signal multiplier, which is intended to encompass the various mechanisms for creating a replicated signal including splitting, electrically dividing, multiplying, etc.
  • the method may also include assigning the channels to a dimensional spatial assignment matrix that may relate to the physical placement of a virtual sound source in space, situated within a three-dimensional space, surrounding a listening point and relating to corresponding simulated loudspeaker locations arranged in a similar way around a listening point.
  • the listening point may be generally in the center between the virtual sound sources.
  • the method may further include selecting for processing some or all of the channel outputs to enhance individual directional spatial properties, either individually or in groups, prior to the collective output being summarily processed by a single or group of HRTF audio processors.
  • the single or group of HRTF audio processors may convert incoming audio into multidimensional audio whose immersive output may be monitored via at least two equidistant loudspeakers or headphones.
  • the channels may be individually adjusted for gain in relation to each other.
  • the output may be received by an audio recording device, for example a computer configured to create digital audio files, which records the audio for playback at a later time.
  • the combined output of the two input channels and additional virtual audio channels may be processed together, each channel being selectable, for ambience, reverberation or room simulation, and may be adjusted for electronic gain, controlling audio channel levels in relation to each other and to the total output.
  • Each channel may be selected individually or in groups, before being output to a virtual surround HRTF audio processor.
  • the system and method may comprise at least five virtual channels, wherein the at least five virtual channels may be a left channel, right channel, center channel, left rear channel and right rear channel.
  • the system and method may comprise at least six virtual channels, wherein the at least six virtual channels may be a left channel, right channel, center channel, left rear channel, right rear channel and center rear channel.
  • the system and method may comprise at least seven virtual channels, wherein the at least seven virtual channels may be a left channel, right channel, center channel, left rear channel, right rear channel, center rear channel and low frequency effects (LFE) channel.
  • LFE low frequency effects
  • the system and method may comprise at least eight virtual channels, wherein the at least eight virtual channels may be a left channel, right channel, center channel, left rear channel, right rear channel, center rear channel, top channel and LFE channel.
  • the system and method may comprise more than eight virtual channels.
  • virtual is used herein to denote that the channel is created and exists within the audio processing equipment of the systems and methods, in contrast to incoming audio signals received or audio output signals which may be output to loudspeakers or other sound- emitting, audio capture, audio recording, broadcast or playback devices.
  • the at least five, six, seven, eight or more virtual channels may be created to "open up” or "unfold” original tracks. Unfolded channels may be processed for accentuating localization frequency bands and audio delays individually before the collective output may be summarily processed by the single or group of HRTF processors and down-mixed to three-dimensional, two- channel virtual surround audio, which may occur collectively and simultaneously.
  • created individualized tracks may be sent through processors generally independent of each other.
  • Each group may have specific parameters applied to the group, which may relate to each physical area in space around a listener. This may create an effect equating dimensional spatiality and an improved sense of dimensional realism.
  • a total signal may be run through HRTF processors or a set of HRTF processors designed to add spatial attributes to the signal, which may make processing of the signal by the listener's head more realistic in terms of localization to human hearing and vestibular systems.
  • Equal amounts of electronic signal originating from the left channel and right channel may combine to create a virtual channel.
  • the additional virtual channel may emit a signal comprising a sum of the left signal and right signal.
  • the signal emitted by the additional virtual channel may also be electronically processed to enhance frequencies between 1 kHz and 5 kHz.
  • the signal emitted by the additional virtual channel may be assigned spatially to a front center of the three-dimensional space surrounding a central listening point.
  • a signal emitted by an additional virtual channel may be electronically delayed by 6-10 milliseconds before being output, wherein the delayed signal may be assigned spatially to a center rear of the three-dimensional space surrounding a central listening point.
  • the signal emitted by an additional virtual channel may be electronically delayed by 4-9 milliseconds before being output, wherein the delayed signal may be assigned spatially to a top of the three-dimensional space surrounding a central listening point.
  • Humans may locate sounds around themselves in a narrow band of frequencies ranging between 1 kHz and 4 kHz. Enhancing these frequencies may accentuate certain directional qualities and distinguish material between the originally input and the newly created audio tracks to better define specific elements within the audio content to the listener's liking. This may relate to more clarity between co-related channels. Enhancing the 1 kHz to 4 kHz band of frequencies may add a localizing component to the audio signal as humans are most aware of this band of frequencies within any audio program.
  • Audio delay processors placed on specific channels based on known delay times may make desired audio appear to emanate from different locations around the listener's head when desirable.
  • the signal emitted by a virtual channel may be electronically processed to enhance frequencies between 0 Hz and 200 Hz and to reduce frequencies above 200 Hz.
  • the signal emitted by this virtual channel may be assigned spatially to a LFE central position of the three-dimensional space surrounding a central listening point.
  • each electronic audio signal originating from the left channel and right channel, respectively may create an additional virtual channel.
  • Each additional virtual channel may comprise the signal originating from the left signal and the right signal, respectively.
  • Each additional single channel comprising of the signal originating from the left signal and the right signal, respectively may be electronically processed to enhance frequencies between 1 kHz and 5 kHz and assigned spatially to the left rear and the right rear, respectively, of the three- dimensional space surrounding a central listening point.
  • Each additional virtual channel comprising of the signal originating from the left signal and the right signal, respectively may be spatially enhanced via a processor with signal output from the right rear channel assignment and left rear channel assignment, respectively, to form individual left rear and right rear outputs.
  • the virtual channels assigned to the left rear channel and right rear channel signals may be initially multiplied into a replica of the original left signal and the original right signal.
  • the signals may be then processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1 kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further.
  • the left rear channel and the right rear channel may be run though a stereo enhancement processor, such as a spatializer, to add clarity.
  • the left rear channel and the right rear channel may be variably processed together with an effect level that may be variable individually or as a group.
  • the left rear channel and the right rear channel may join remaining channels by either selectively being processed with a room simulation or a reverberation processor, or sent directly to the HRTF processor set and stereo output.
  • the virtual channel assigned to the center channel may be initially derived from a sum of the original left signal and the original right signal.
  • the combined monophonic signal may be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1 kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further.
  • the channel gain may be set slightly higher in relation to the left channel and the right channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set.
  • Audio signal delay units may electronically delay a signal in time from input to output.
  • Reverberation units may be a type of audio delay, which may give a sense of space to incoming audio information.
  • the virtual channel assigned to the rear center channel may be initially derived from a sum of the original left signal and the original right signal.
  • the summed monophonic signal may be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1 kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further.
  • the center rear channel may then run though a dynamic equalizer or other high frequency enhancement processor to add clarity and separation in the rear of the listener's head.
  • the signal may be electronically delayed by 6-10 milliseconds before being output.
  • the delayed signal may be assigned spatially to the center rear of the three-dimensional space surrounding a central listening point.
  • This channel gain may be set slightly higher or lower in relation the left rear channel and right rear channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set.
  • the signal may be electronically delayed by 4-8 milliseconds before being output.
  • Dynamic equalization is a type of processing where an amount of tonal boost varies according to dynamics of a processed signal. Additional brightness may be achieved with dynamic equalization by adding a dynamic, high-frequency boost to sounds. Such dynamic effects may be dramatic as they may increase the tonal contrast within specific parts of music, rather than treating a whole mix the same. Most exciters or enhancers may combine elements of dynamic equalization with other processes, including harmonic synthesis and phase manipulation.
  • the virtual channel assigned to the LFE channel may be initially derived from a sum of the original left signal and right signal.
  • the combined monophonic signal may then be equalized such that a frequency range of the signal may be electronically processed to enhance frequencies between 0 Hz and 200 Hz and reduce frequencies above 200 Hz.
  • the signal may be assigned spatially to an LFE central position of the three-dimensional space surrounding a central listening point. If remaining desirable frequencies are completely diminished and remaining frequencies are given increased amplitude by greater than 3 dB or more, the bass effect may be made more pronounced. Care may be exercised not to overdrive other stages with the channel level as the LFE may overdrive the HRTF processor.
  • the channel gain may be set in relation to other channel gains before being sent directly to the HRTF processor set.
  • the original left channel and original right channel may be processed initially via an additional independent stereo enhancement processor.
  • Original stereo tracks may provide a backbone to an overall mix and may be left alone until they are input to a final global processor with other new channels.
  • a cohesion, unification and sense of believability may be output to the listener.
  • a method for creating three-dimensional, two- channel binaural audio from stereo and multichannel surround sound may include electronically dividing and multiplying, i.e. replicating, a plurality of channels of incoming audio into a plurality of additional virtual audio channels.
  • the incoming audio may include at least a left channel and a right channel.
  • the method may also include assigning the channels to a
  • the method may further include selecting for processing some or all of the channel outputs to enhance individual directional sonic properties relating to enhancing a perceived directionality of sound, either individually or in groups, prior to individual audio outputs being summarily processed by a single or group of ambience processors.
  • the multi-directional output of the single or group of ambience processors may be monitored via at least five equidistant loudspeakers.
  • the channels may be individually adjusted for gain in relation to each other. In most instances the virtual channels are adjusted to be louder than the originally input channel before being output to the final HRTF processors. However, this level control may be program dependent, user preferred or otherwise adjustable.
  • a method for creating three-dimensional, two- channel binaural audio from monophonic (hereinafter "mono") sound sources by converting a mono audio signal to two or more channels of multichannel surround sound may include electronically dividing and multiplying (i.e. replicating), a single channel of incoming audio into a plurality of additional audio channels.
  • the method may include assigning the channels to a spatial location matrix that may relate to the physical placement of a virtual sound source in space, situated within a three-dimensional space, surrounding, for example, a generally central listening point and relating to corresponding simulated loudspeaker locations arranged in a similar way around a central listening point.
  • the method may further include selecting for processing some or all of the channel outputs to enhance individual directional sonic properties relating to enhancing a perceived directionality of sound, either individually or in groups, prior to individual audio outputs being summarily processed by a single or group of ambience processors.
  • the channels may be individually adjusted for gain in relation to each other. In most instances the virtual channels are adjusted to be louder than the originally input channel before being output to the final HRTF processors. However, this level control may be program dependent, user preferred or otherwise adjustable.
  • a method for creating three-dimensional, two- channel binaural audio from stereo and multichannel sound sources by converting multichannel audio signals to three-dimensional surround sound may include electronically dividing and multiplying (i.e. replicating) a plurality of at least four individual input channels of incoming audio.
  • the at least four individual input channels may be a front left channel, front right channel, left rear channel and right rear channel.
  • the method may also include assigning the channels to a dimensional spatial assignment matrix that may relate to the physical placement of a virtual sound source in space, situated within a three-dimensional space, surrounding a generally central listening point and relating to corresponding simulated loudspeaker locations arranged in a similar way around a central listening point.
  • the method may further include selecting for processing some or all of the channel outputs to enhance individual directional spatial properties, either individually or in groups, prior to the collective output being summarily processed by a single or group of HRTF audio processors.
  • the single or group of HRTF audio processors may convert incoming audio to multidimensional audio whose immersive stereo output may be monitored via at least two equidistant loudspeakers or headphones.
  • the channels may be individually adjusted for gain in relation to each other. In most instances the virtual channels are adjusted to be louder than the originally input channel before being input into the final HRTF processors. However, this level control may be program dependent, user preferred or otherwise adjustable.
  • the front left channel and the front right channel may be summed to form an individual front center channel.
  • the front center channel may be derived from generally equal amounts of the front left channel and front right channel.
  • the rear left channel and rear right channel may be summed together to form an individual rear center channel.
  • the rear center channel may be derived from generally equal amounts of the rear left channel and the rear right channel.
  • the method may comprise a plurality of at least six individual input channels of incoming audio.
  • the at least six individual input channels may be a front left channel, front right channel, left rear channel, right rear channel, front center channel and LFE channel.
  • Multichannel signals may be run through a process previously mixed or created for immersive three-dimensional audio delivery in four, five or more channels of sound and initially intended to be heard in a theatre system or a home theatre system, such as a DVD containing 5.1 channels of surround sound or a broadcast delivered in HDTV format containing 5.1 or 6.1 channel surround sound audio delivery means. This process may generally replace and work in substitute of the stereo input mode.
  • Individual incoming multichannels of audio may be initially assigned as labeled, left to left, right to right, left rear to left rear, etc. Groups or individuals whose position may equate to surrounding positions of loudspeakers in a multichannel speaker system are designed for surround sound playback audio, such as a home theater. Individual channel assignments may be processed individually with the stereo input mode where they may be processed to achieve a desired immersive audio localization effect.
  • the left rear channel and right rear channel signal may be untouched on input and may remain as the original incoming left rear signal and right rear signal.
  • the signals may be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range.
  • the left rear channel and right rear channel may run though a spatializer or other stereo enhancement to add clarity.
  • the left rear channel and right rear channel may be processed together with an effect level that may be variable individually or as a group.
  • the left rear channel and right rear channel may join remaining channels by either selectively being processed with a room simulation or reverberation processor, or sent directly to the HRTF processor set and output.
  • a center channel may be assigned to a center channel and may contain the same information as contained within the original incoming material. If a center channel is not originally input, such as in a Dolby surround or four-channel surround mix, a channel assigned to the front center channel may be initially derived from a sum of the original left signal and original right signal. This combined monophonic signal may then be processed so that the frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further.
  • the channel gain may be set slightly higher in relation the left front channel and right front channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set.
  • the channel assigned to the rear center channel may be initially derived from a sum of the original left rear signal and right rear signal.
  • the summed monophonic signal may then be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further.
  • the center rear channel may then run though a dynamic equalizer or other high frequency enhancement processor to add clarity and separation in the rear.
  • the signal may be electronically delayed by 6-10 milliseconds before being output.
  • the delayed signal may be assigned spatially to the center rear of the three-dimensional space surrounding a listening point.
  • This channel gain may be set slightly higher or lower in relation the left rear channel and right rear channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set.
  • the signal may be electronically delayed by 4-8 milliseconds before being output.
  • an LFE channel may be assigned to an LFE channel and may contain the same information as contained within the original incoming material. If an LFE channel is not included in the incoming audio, a channel assigned to the LFE channel may be initially derived from a sum of the original left signal and original right signal. The summed monophonic signal may be processed such that a frequency range of the signal may enhance the frequencies between 0 Hz and 200 Hz and reduce frequencies above 200 Hz. The signal may be assigned spatially to an LFE central position of the three- dimensional space surrounding a listening point.
  • the bass effect may be made more pronounced. Care may be exercised not to overdrive other stages with the channel level as the LFE may overdrive the HRTF processor.
  • the channel gain may be set in relation the other channels before being sent directly to the HRTF processor set.
  • the original left front channel and original right front channel may be processed initially via an additional independent stereo enhancement processor.
  • Original stereo tracks may provide a backbone to an overall mix until inputted to a final global processor with all newly created channels.
  • a cohesion, unification and sense of believability may be output to the listener.
  • Using high quality audio pre-amplifiers with clean sonic signal to boost the signal at the front-end of the process while acquiring incoming audio may help "opening up" sound for splitting. Additionally utilizing the benefits of tube or valve pre-amplification, whether it is analog electronics or simulated digitally, with its subtly added harmonics may further separate the incoming material into enhanced frequency divisions.
  • Some audio level compression used on the incoming audio also helps to smooth out the entire process.
  • Some gain makeup at the end of the process, including additional compression and equalization (EQ) may be used to make up for any frequency or amplitude deficiencies added by the total process and to smooth out any acquired amplitude transients that may have been added by additionally selected processors. Compression and EQ may be used to make the total three-dimensional mix sound more pleasing to the listener's ear or to the intended listening market.
  • Signal processors may enhance specific frequency bands or audible delays. Post-processor, affected signals may have an audible difference to an original signal once passed through the processor. At least 3 dB and greater of signal boost in the described frequency range on certain channels may be required to hear a difference on affected channels. An amount greater than 3 dB may be variable. Processors known commonly in the art that may be used and that may substitute each other if slightly different effects are desired include the following.
  • An audio equalizer is a processor for adjusting the balance between frequency components within an audible electronic signal.
  • a harmonic generator may be an audio signal processing technique used to enhance a signal by dynamic equalization, phase manipulation, harmonic synthesis of high frequency signals, and through adding subtle harmonic distortion.
  • a harmonic generator may be further used to synthesize harmonics of low frequency signals to simulate deep bass in smaller speakers. Harmonic synthesis may involve creating higher order harmonics from fundamental frequency signals present in a recording.
  • An "exciter" processor may generate high frequency components that may not be part of an original signal by employing a non-linear distortion process resembling overdrive and distortion effects.
  • the "exciter” processor may pass an input signal through a high-pass filter before feeding the input signal into the harmonics (distortion) generator, which may result in artificial harmonics being added to the original signal.
  • the artificial harmonics added to the original signal may contain frequencies at least one octave above a threshold of the high-pass filter.
  • a distorted signal may be mixed with the original signal.
  • a stereophonic image enhancer may be used by a system and apparatus disclosed in U.S. Patent No. 5,412,731, entitled “AUTOMATIC STEREOPHONIC MANIPULATION SYSTEM AND APPARATUS FOR IMAGE ENHANCEMENT,” published on May 2, 1995, wherein the spatializer technology may manipulate the original signal for the listener to perceive a stereo image beyond boundaries of two loudspeakers and place sound in front of the listener in an arc of 180 degrees. As humans hear in 360 degrees and in a three-dimensional space, 180 degrees of audio may not be sufficient to supply immersive audio.
  • stereophonic image enhancement may be used for some specific channel sets to achieve a heightened sense of depth and space. For example, stereophonic image enhancement on rear channels may add immeasurably to a perceive space of a program.
  • HRTF processors are described in U.S. Patent No. 6,980,661, entitled “METHOD OF AND APPARATUS FOR PRODUCING APPARENT MULTIDIMENSIONAL SOUNDS," published on December 27, 2005.
  • HRTF processors may take incoming audio and create a three-dimensional effect over headphones.
  • HRTF processors may be commercially available from companies such as Dolby Laboratories, QSound, DTS, and Zoran Corporation. HRTF processors may focus on sound quality first, with adjustability and variability from source to source within the process, which may make the process more flexible and program-dependent as certain types of incoming music may work best with certain settings.
  • Targeting specific frequency bands that may relate to the localization function in a human nervous system may stimulate the human vestibular and localization systems. Resulting media may playback on any type of media player without degradation of the original signal or total signal loss when played through certain loudspeaker configurations via phase cancellation. Adding virtual height and center rear channels to the three-dimensional soundscape may result in an improvement over an audio signal that was encoded via MP3.
  • Any previously recorded stereo, mono or multichannel sound information may be used as a source.
  • the sound information may be processed for real-time playback via binaural headphones or other stereo listening or broadcast or recorded for future playback in enhanced three- dimensional binaural audio.
  • the sound information may be processed to provide a three- dimensional feeling of personal multichannel surround sound while maintaining integrity of the original material.
  • Resulting audio delivery may be of a high quality and may provide HRTF audio cues within an audio program required for the listener to internally process the audio and its location in a three-dimensional space.
  • the processed audio may be experienced by the listener over headphones or two equidistant speakers as situated in a triangle with reference to the listener, such as stereo speakers on a laptop computer.

Abstract

Described are systems and methods for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources. The systems and methods create and process live or previously recorded, stereo, mono and multichannel audio to create spatially immersive, binaural audio. Further, the method and system may provide an affect to an audio portion of an audiovisual program when linked with video program material contained within the same media or apply to audio-only information created independently. Stereophonic (two-channel) or multichannel (5.1, 6.1, 7.1, 10.2, 22.3, etc.) surround sound may be processed according to the method and converted to binaural stereo, in addition to enhancing the spatiality of the original mix for audio such as music, movies or broadcasts. Additional spatial cues are added on top of the original signal to create a natural feeling of spatial believability. This results in a more pleasingly immersive auditory and natural vestibular experience for the listener.

Description

A SYSTEM AND METHOD FOR CREATING THREE-DIMENSIONAL BINAURAL AUDIO FROM STEREO, MONO AND MULTICHANNEL SOUND SOURCES
FIELD OF THE INVENTION
[0001] The present invention relates to creating three-dimensional binaural audio. More particularly, the present invention relates to a system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources which may be output to a listener over two or more channels, broadcast, shared or recorded for future playback.
BACKGROUND OF THE INVENTION
[0002] In recent years, the general adoption of portable media players and their increased use in the world finds 70 percent of humans listening to music on-the-go via headphones. This means that most people are equipped to receive immersive or enhanced stereo audio if delivered simply and directly to their headphones. Previous delivery methods to the consumer that were necessary in order to achieve any capability of experiencing personal immersive audio playback were complicated, cumbersome and certainly not portable.
[0003] Although the technology to deliver a much higher quality listening experience to people exists, the relatively low quality MP3 audio file has taken over as the most common file type, likely due to its reduced file size and easy downloadability. Unfortunately, however, due to modern music production techniques, the delivered MP3 package does not usually sound very immersive at all. In this current media-hungry world, the demand for increased quality in both audio and video productions is increasing dramatically.
[0004] Accordingly, there remains a need for improvements in the art. SUMMARY OF THE INVENTION
[0005] In accordance with an aspect of the invention, there is provided a method for creating three-dimensional binaural audio from audio signals on a left stereo sound channel and a right stereo sound channel, the method comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the left stereo sound channel or the right stereo sound channel of incoming audio or by combining replicated audio signals of the left stereo sound channel and the right stereo sound channel of incoming audio; assigning each of the left stereo sound channel, right stereo sound channel, and the virtual channels to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three- dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the left stereo sound channel, the right stereo sound channel, and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio; and outputting the two or more audio output signals for recording, distribution, or output as sound.
[0006] In accordance with a further aspect of the invention, there is provided a method for creating three-dimensional binaural audio from audio signals on three or more audio channels, the method comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the three or more audio channels of incoming audio or by combining the audio signals on two or more of the three or more audio channels of incoming audio; assigning each of the three or more audio channels and the virtual channels to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three- dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the three or more audio channels and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio; and outputting the two or more audio output signals for recording, distribution, or output as sound. [0007] According to a further embodiment, the present invention provides a system for creating three-dimensional binaural audio from audio signals on two or more audio channels, the system comprising: a signal multiplier for creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the two or more audio channels of incoming audio or by combining the replicated audio signals on two or more of the two or more audio channels of incoming audio using a signal combiner; wherein each of the two or more audio channels and the virtual channels is assigned to a position in a three- dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; one or more audio processors for processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; one or more head-related transfer function (HRTF) audio processors for processing one or more of the audio signals on the two or more audio channels and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three- dimensional binaural audio; and one or more sound-emitting devices for receiving and emitting the two or more audio output signals as sound or an audio recording device for receiving and recording the two or more audio output signals for future playback or an audio distribution device for distributing the audio output signals.
[0008] According to a further embodiment, the present invention provides a method for creating three-dimensional binaural audio from an audio signal on a mono audio channel, the method comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the mono audio channel of incoming audio; assigning each of the mono audio channel and the virtual channels to a position in a three- dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the mono audio channel and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio; and outputting the two or more audio output signals for recording, distribution, or output as sound.
[0009] Other aspects and features according to the present application will become apparent to those ordinarily skilled in the art upon review of the following description of embodiments of the invention in conjunction with the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] Reference will now be made to the accompanying drawings which show, by way of example only, embodiments of the invention, and how they may be carried into effect, and in which:
[0011] Figure 1 is a system diagram for a system for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention;
[0012] Figure 2 is a system diagram for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention;
[0013] Figure 3 is a system diagram for creating three-dimensional binaural audio from audio signals on two or more audio channels according to an embodiment of the present invention;
[0014] Figure 4 is a system diagram for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention;
[0015] Figure 5 is a system diagram for creating three-dimensional binaural audio from audio signals on two or more audio channels according to an embodiment of the present invention; [0016] Figure 6 is a flow diagram of a method for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention;
[0017] Figure 7 is a flow diagram of a method for creating three-dimensional binaural audio from audio signals on stereo audio channels according to an embodiment of the present invention; and
[0018] Figure 8 is a flow diagram of a method for creating three-dimensional binaural audio from an audio signal on a mono audio channel according to an embodiment of the present invention.
[0019] Like reference numerals indicate like or corresponding elements in the drawings.
DETAILED DESCRIPTION OF THE INVENTION
[0020] The detailed embodiments of the present invention are disclosed herein. It should be understood, however, that the disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms. Therefore, the details disclosed herein are not to be interpreted as limiting, but merely as a basis for teaching one skilled in the art how to make and use the invention.
[0021] Referring to Figures 1 to 8, systems and methods for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources are described.
[0022] According to an embodiment as shown in Figure 1, in a first stage 101, a left channel 104 and a right channel 102 from a stereo sound source 103 may be output into a left signal 108 and a right signal 106. The left signal 108 and the right signal 106 may go through an audio distribution bussing matrix 110.
[0023] In a second stage 111, the audio distribution bussing matrix 110 may output a left side signal 112 and a right side signal 114. The left side signal 112 and the right side signal 114 may be processed using a processor 122. The left side signal 112 and the right side signal 114 may be combined using a signal combiner 124 to create a center rear signal 128. The center rear signal 128 may be processed using a processor 122. The audio distribution bussing matrix 110 may also output a left signal 116 and a right signal 118. The left signal 116 and the right signal 118 may be transmitted to a top channel 120. The left signal 116 and the right signal 118 may be combined using a signal combiner 124 to create a Low Frequency Effects (LFE) channel 126. The LFE channel 126 may be processed with a processor 122 and panned center 182. The left signal 116 and the right signal 118 may be combined using a signal combiner 124 to create a center signal 130. The center signal 130 may be processed using a processor 122. The processed left side signal 112 may be panned rear 170, the processed right side signal 114 may be panned rear 172 and the center rear signal 128 may be panned rear 174. The processed left side signal 112, right side signal 114, center rear signal 128 may also be processed using an ambience processor 132. The left signal 116 may be panned front 176, the right signal may be panned front 178 and the center signal 130 may be panned front 180. The left signal 116, right signal 118 and center signal 130 may be processed using an ambience processor 132. Each channel 112, 114, 116, 118, 126, 128, 130 has a bypass 160 activated by a switch 162 so that the audio signal on that channel may optionally pass through the ambience processor 132 without processing. Each channel 112, 114, 116, 118, 126, 128, 130 may also be individually adjusted for gain 164.
[0024] In a third stage 136, each channel 112, 114, 116, 118, 126, 128, 130 may be processed by an FIRTF processor 138 to create a two-channel output via a left channel output 142 and a right channel output 140. The right channel output 140 and the left channel output 142 may be output via headphones with sonic configuration of a right side 152 and a left side 150 and a virtual left side 148, virtual right side 144, virtual rear 146, virtual center 154 and virtual subwoofer 156.
[0025] According to an embodiment as shown in Figure 2, a left channel 204 and a right channel 202 from a stereo sound source 203 may be output into a left signal 208 and a right signal 206. The left signal 208 and the right signal 206 may be divided and multiplied into a plurality of additional virtual audio channels by a signal multiplier 280. The left signal 208 and right signal 206 may be divided and multiplied into a left front channel 214 and a right front channel 216. The left front channel 214 may be output 260 and the level may be controlled 274. The right front channel 216 may be output 254 and the level may be controlled 268. The left signal 208 and the right signal 206 may be combined using a signal combiner 224 and processed using an equalization (EQ) processor 226 to create a center front channel 234. The center front channel 234 may be output 262 and the level may be controlled 276. The left signal 208 and the right signal 206 may be combined using a signal combiner 224 and processed using an EQ processor 226 to create an LFE channel 228. The LFE channel 228 may be output 250 and the level may be controlled 264. The left signal 208 and the right signal 206 may be combined using a signal combiner 224 and processed using an EQ processor 226 to create a center rear channel 248. The center rear channel 248 may be output 256 and the level may be controlled 270. The left signal 208 may be processed using an EQ processor 226 and a dynamic equalizer 230 or stereo image enhancer 232 to create a left rear channel 236. The left rear channel 236 may be output 258 and the level may be controlled 272. The right signal 206 may be processed using an EQ processor 226 and a dynamic equalizer 230 or stereo image enhancer 232 to create a right rear channel 238. The right rear channel 238 may be output 252 and the level may be controlled 266.
[0026] According to an embodiment as shown in Figure 3, a multichannel sound source 301 may output a left channel 304, a right channel 302, a center channel 308, a LFE channel 310, a left rear channel 312 and a right rear channel 314, wherein the channels 304, 302, 308, 310, 312, 314 may be distributed by audio signal distribution matrix 390. The left rear channel 312 may be processed using an EQ processor 326 and a spatial processor 328, panned rear 362 and output 380. The right rear channel 314 may be processed using an EQ processor 326 and a spatial processor 328, panned rear 356 and output 382. The left channel 304 may be panned front 350, adjusted for gain 374 and output 380. The right channel 302 may be panned front 354, adjusted for gain 370 and output 382. The processed left rear channel 312 and right rear channel 314 may be combined using a signal combiner 324, processed, and adjusted for gain 372 to create a center rear channel 348. The center rear channel 348 may be panned rear 360 and output 382. The center channel 308 may be panned front 352 and output 380. The LFE channel 310 may be processed using an EQ processor 326, panned center 358 and output 382.
[0027] According to an embodiment as shown in Figure 4, a right channel 402 and a left channel 404 may go through an audio distribution bussing matrix 410 to create a plurality of additional audio channels. The additional channels may be selected for processing by an individual or group of spatial processors 403 to enhance individual directional spatial properties prior to be being processed by a variable ambience processor 405. The variable ambience processor 432 may output a rear center channel 412, a center channel 414, an LFE channel 416, a right side channel 418, a left side channel 420, a right channel 422 and a left channel 424, which may be processed by an HRTF processor 438. The HRTF processor 438 may convert incoming audio into multidimensional audio whose immersive output may be monitored via headphones with a right side 452 and a left side 450 or loudspeakers with a virtual center 454, a virtual left side 448, a virtual right side 444, a virtual rear center 446, and a virtual subwoofer 456.
[0028] According to an embodiment as shown in Figure 5, a multichannel audio input 501 such as a DVD containing 5.1 channels of surround sound may be selected for processing by an individual or group of spatial processors 503 to enhance individual directional and spatial properties prior to be being processed by a variable ambience processor 505. The variable ambience processor 532 may output a rear center channel 512, a center channel 514, an LFE channel 516, a right side channel 518, a left side channel 520, a right channel 522 and a left channel 524, which may be processed by an HRTF processor 538. The HRTF processor 538 may convert incoming audio into multidimensional audio whose immersive output may be monitored via headphones with a right side 552 and a left side 550 or loudspeakers with a virtual center 554, a virtual left side 548, a virtual right side 544, a virtual rear center 546, and a virtual subwoofer 556.
[0029] According to an embodiment as shown as Figure 6, a method 600 for creating three- dimensional binaural audio from audio signals on a left stereo sound channel and a right stereo sound channel is shown. A first step 605 includes creating one or more virtual channels, wherein for each virtual channel, the virtual channel is created by replicating the audio signal on the left stereo sound channel or the right stereo sound channel of incoming audio or by combining replicated audio signals of the left stereo sound channel and the right stereo sound channel of incoming audio. Alternatively, virtual channels may be created for one or both of the right stereo sound channel and the left stereo sound channel (i.e. to be used in place of the original channel), in which case subsequent steps of the method 600 may proceed using the virtual channels corresponding to those original channels. Step 610 includes assigning each of the left stereo sound channel, right stereo sound channel, and the virtual channels to a position to a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point. Step 615 includes processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix. Step 620 includes processing one or more of the audio signals on the left stereo sound channel, right stereo sound channel, and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio. Step 625 includes outputting the two or more audio output signals for recording, distribution, or output as sound.
[0030] According to an embodiment as shown as Figure 7, a method 700 for creating three- dimensional binaural audio from audio signals on three or more audio channels is shown. Step 705 includes creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the three or more audio channels of incoming audio or by combining the audio signals on two or more of the three or more audio channels of incoming audio. As in the previous method 600, alternatively, virtual channels may be created for one or more of the original channels (i.e. to be used in place of the original channel), in which case subsequent steps of the method 700 may proceed using the virtual channels corresponding to those original channels. Step 710 includes assigning each of the three or more audio channels and the virtual channels to a position to a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point. Step 715 includes processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix. Step 720 includes processing one or more of the audio signals on the three or more audio channels and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio. Step 725 includes outputting the two or more audio output signals for recording, distribution, or output as sound.
[0031] According to an embodiment as shown as Figure 8, a method 800 for creating three- dimensional binaural audio from an audio signal on a mono audio channel is shown. Step 805 includes creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the mono audio channel of incoming audio. As in the previous methods 600 and 700, alternatively, a virtual channel may be created for the original channel (i.e. to be used in place of the original channel), in which case subsequent steps of the method 800 may proceed using the virtual channel corresponding to the original channel. Step 810 includes assigning each of the mono audio channel and the virtual channels to a position to a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point. Step 815 includes processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix. Step 820 includes processing one or more of the audio signals on the mono audio channel and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio. Step 825 includes outputting the two or more audio output signals for recording, distribution, or output as sound.
[0032] According to a further embodiment, a method for creating three-dimensional, binaural audio from stereo, mono and multichannel sound sources by converting stereo two-channel audio into spatially immersive binaural audio may include replicating or electronically splitting two input channels of incoming audio into a plurality of additional virtual audio channels. The two input channels may include a left channel and a right channel. The term replicating as used herein is intended to encompass terms such as splitting, multiplying and copying to create either an identical or near-identical audio signal (which may vary due to factors such as signal loss, etc.) as would be appreciated by a person skilled in the field having knowledge of analog and digital audio processing. The corresponding system-related term used is signal multiplier, which is intended to encompass the various mechanisms for creating a replicated signal including splitting, electrically dividing, multiplying, etc.
[0033] The method may also include assigning the channels to a dimensional spatial assignment matrix that may relate to the physical placement of a virtual sound source in space, situated within a three-dimensional space, surrounding a listening point and relating to corresponding simulated loudspeaker locations arranged in a similar way around a listening point. In many instances, the listening point may be generally in the center between the virtual sound sources. However, it is also possible to place the listening point in a non-central point as long as the virtual sound sources partially or fully surround the point in some configuration. The method may further include selecting for processing some or all of the channel outputs to enhance individual directional spatial properties, either individually or in groups, prior to the collective output being summarily processed by a single or group of HRTF audio processors. The single or group of HRTF audio processors may convert incoming audio into multidimensional audio whose immersive output may be monitored via at least two equidistant loudspeakers or headphones. The channels may be individually adjusted for gain in relation to each other. Alternatively, the output may be received by an audio recording device, for example a computer configured to create digital audio files, which records the audio for playback at a later time.
[0034] The combined output of the two input channels and additional virtual audio channels may be processed together, each channel being selectable, for ambience, reverberation or room simulation, and may be adjusted for electronic gain, controlling audio channel levels in relation to each other and to the total output. Each channel may be selected individually or in groups, before being output to a virtual surround HRTF audio processor.
[0035] According to an embodiment, the system and method may comprise at least five virtual channels, wherein the at least five virtual channels may be a left channel, right channel, center channel, left rear channel and right rear channel. According to further embodiment, the system and method may comprise at least six virtual channels, wherein the at least six virtual channels may be a left channel, right channel, center channel, left rear channel, right rear channel and center rear channel. According to a further embodiment, the system and method may comprise at least seven virtual channels, wherein the at least seven virtual channels may be a left channel, right channel, center channel, left rear channel, right rear channel, center rear channel and low frequency effects (LFE) channel. According to a further embodiment, the system and method may comprise at least eight virtual channels, wherein the at least eight virtual channels may be a left channel, right channel, center channel, left rear channel, right rear channel, center rear channel, top channel and LFE channel. According to a further embodiment, the system and method may comprise more than eight virtual channels. The term virtual is used herein to denote that the channel is created and exists within the audio processing equipment of the systems and methods, in contrast to incoming audio signals received or audio output signals which may be output to loudspeakers or other sound- emitting, audio capture, audio recording, broadcast or playback devices.
[0036] The at least five, six, seven, eight or more virtual channels may be created to "open up" or "unfold" original tracks. Unfolded channels may be processed for accentuating localization frequency bands and audio delays individually before the collective output may be summarily processed by the single or group of HRTF processors and down-mixed to three-dimensional, two- channel virtual surround audio, which may occur collectively and simultaneously.
[0037] According to an embodiment, before merging into a deliverable three-dimensional binaural stereo output, created individualized tracks may be sent through processors generally independent of each other. Each group may have specific parameters applied to the group, which may relate to each physical area in space around a listener. This may create an effect equating dimensional spatiality and an improved sense of dimensional realism. Before merging into a deliverable three- dimensional stereo output, a total signal may be run through HRTF processors or a set of HRTF processors designed to add spatial attributes to the signal, which may make processing of the signal by the listener's head more realistic in terms of localization to human hearing and vestibular systems.
[0038] Equal amounts of electronic signal originating from the left channel and right channel, wherein the left channel may emit a left signal and the right channel may emit a right signal, may combine to create a virtual channel. The additional virtual channel may emit a signal comprising a sum of the left signal and right signal. The signal emitted by the additional virtual channel may also be electronically processed to enhance frequencies between 1 kHz and 5 kHz. According to an embodiment, the signal emitted by the additional virtual channel may be assigned spatially to a front center of the three-dimensional space surrounding a central listening point.
[0039] According to another embodiment, a signal emitted by an additional virtual channel may be electronically delayed by 6-10 milliseconds before being output, wherein the delayed signal may be assigned spatially to a center rear of the three-dimensional space surrounding a central listening point.
[0040] According to a further embodiment, the signal emitted by an additional virtual channel may be electronically delayed by 4-9 milliseconds before being output, wherein the delayed signal may be assigned spatially to a top of the three-dimensional space surrounding a central listening point.
[0041] Humans may locate sounds around themselves in a narrow band of frequencies ranging between 1 kHz and 4 kHz. Enhancing these frequencies may accentuate certain directional qualities and distinguish material between the originally input and the newly created audio tracks to better define specific elements within the audio content to the listener's liking. This may relate to more clarity between co-related channels. Enhancing the 1 kHz to 4 kHz band of frequencies may add a localizing component to the audio signal as humans are most aware of this band of frequencies within any audio program.
[0042] Furthermore, it may take sound approximately 6-10 milliseconds, when measured on a variety of human head sizes, for audio to travel from one side of the listener's head to another, front to back or side to side. Audio delay processors placed on specific channels based on known delay times may make desired audio appear to emanate from different locations around the listener's head when desirable.
[0043] According to an embodiment, the signal emitted by a virtual channel may be electronically processed to enhance frequencies between 0 Hz and 200 Hz and to reduce frequencies above 200 Hz. The signal emitted by this virtual channel may be assigned spatially to a LFE central position of the three-dimensional space surrounding a central listening point.
[0044] According to an embodiment, each electronic audio signal originating from the left channel and right channel, respectively, may create an additional virtual channel. Each additional virtual channel may comprise the signal originating from the left signal and the right signal, respectively. Each additional single channel comprising of the signal originating from the left signal and the right signal, respectively, may be electronically processed to enhance frequencies between 1 kHz and 5 kHz and assigned spatially to the left rear and the right rear, respectively, of the three- dimensional space surrounding a central listening point. Each additional virtual channel comprising of the signal originating from the left signal and the right signal, respectively, may be spatially enhanced via a processor with signal output from the right rear channel assignment and left rear channel assignment, respectively, to form individual left rear and right rear outputs.
[0045] According to an embodiment, the virtual channels assigned to the left rear channel and right rear channel signals may be initially multiplied into a replica of the original left signal and the original right signal. The signals may be then processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1 kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further. The left rear channel and the right rear channel may be run though a stereo enhancement processor, such as a spatializer, to add clarity. The left rear channel and the right rear channel may be variably processed together with an effect level that may be variable individually or as a group. The left rear channel and the right rear channel may join remaining channels by either selectively being processed with a room simulation or a reverberation processor, or sent directly to the HRTF processor set and stereo output.
[0046] According to an embodiment, the virtual channel assigned to the center channel may be initially derived from a sum of the original left signal and the original right signal. The combined monophonic signal may be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1 kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further. The channel gain may be set slightly higher in relation to the left channel and the right channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set.
[0047] Audio signal delay units may electronically delay a signal in time from input to output. Reverberation units may be a type of audio delay, which may give a sense of space to incoming audio information.
[0048] According to an embodiment, the virtual channel assigned to the rear center channel may be initially derived from a sum of the original left signal and the original right signal. The summed monophonic signal may be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1 kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further. The center rear channel may then run though a dynamic equalizer or other high frequency enhancement processor to add clarity and separation in the rear of the listener's head. The signal may be electronically delayed by 6-10 milliseconds before being output. The delayed signal may be assigned spatially to the center rear of the three-dimensional space surrounding a central listening point. This channel gain may be set slightly higher or lower in relation the left rear channel and right rear channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set. For a top channel, the signal may be electronically delayed by 4-8 milliseconds before being output.
[0049] Dynamic equalization is a type of processing where an amount of tonal boost varies according to dynamics of a processed signal. Additional brightness may be achieved with dynamic equalization by adding a dynamic, high-frequency boost to sounds. Such dynamic effects may be dramatic as they may increase the tonal contrast within specific parts of music, rather than treating a whole mix the same. Most exciters or enhancers may combine elements of dynamic equalization with other processes, including harmonic synthesis and phase manipulation.
[0050] According to an embodiment, the virtual channel assigned to the LFE channel may be initially derived from a sum of the original left signal and right signal. The combined monophonic signal may then be equalized such that a frequency range of the signal may be electronically processed to enhance frequencies between 0 Hz and 200 Hz and reduce frequencies above 200 Hz. The signal may be assigned spatially to an LFE central position of the three-dimensional space surrounding a central listening point. If remaining desirable frequencies are completely diminished and remaining frequencies are given increased amplitude by greater than 3 dB or more, the bass effect may be made more pronounced. Care may be exercised not to overdrive other stages with the channel level as the LFE may overdrive the HRTF processor. The channel gain may be set in relation to other channel gains before being sent directly to the HRTF processor set.
[0051] According to an embodiment, the original left channel and original right channel may be processed initially via an additional independent stereo enhancement processor. Original stereo tracks may provide a backbone to an overall mix and may be left alone until they are input to a final global processor with other new channels. By running all channels, except the LFE channel, through a final global ambience processor, a cohesion, unification and sense of believability may be output to the listener.
[0052] According to a further embodiment, a method for creating three-dimensional, two- channel binaural audio from stereo and multichannel surround sound may include electronically dividing and multiplying, i.e. replicating, a plurality of channels of incoming audio into a plurality of additional virtual audio channels. The incoming audio may include at least a left channel and a right channel. The method may also include assigning the channels to a
dimensional spatial assignment matrix that may relate to the physical placement of a virtual sound source in space, situated within a three-dimensional space, surrounding, according to one example configuration, a generally central listening point and relating to corresponding simulated loudspeaker locations arranged in a similar way around a central listening point. The method may further include selecting for processing some or all of the channel outputs to enhance individual directional sonic properties relating to enhancing a perceived directionality of sound, either individually or in groups, prior to individual audio outputs being summarily processed by a single or group of ambience processors. The multi-directional output of the single or group of ambience processors may be monitored via at least five equidistant loudspeakers. The channels may be individually adjusted for gain in relation to each other. In most instances the virtual channels are adjusted to be louder than the originally input channel before being output to the final HRTF processors. However, this level control may be program dependent, user preferred or otherwise adjustable.
[0053] According to a further embodiment, a method for creating three-dimensional, two- channel binaural audio from monophonic (hereinafter "mono") sound sources by converting a mono audio signal to two or more channels of multichannel surround sound may include electronically dividing and multiplying (i.e. replicating), a single channel of incoming audio into a plurality of additional audio channels. The method may include assigning the channels to a spatial location matrix that may relate to the physical placement of a virtual sound source in space, situated within a three-dimensional space, surrounding, for example, a generally central listening point and relating to corresponding simulated loudspeaker locations arranged in a similar way around a central listening point. The method may further include selecting for processing some or all of the channel outputs to enhance individual directional sonic properties relating to enhancing a perceived directionality of sound, either individually or in groups, prior to individual audio outputs being summarily processed by a single or group of ambience processors. The channels may be individually adjusted for gain in relation to each other. In most instances the virtual channels are adjusted to be louder than the originally input channel before being output to the final HRTF processors. However, this level control may be program dependent, user preferred or otherwise adjustable.
[0054] According to a yet further embodiment, a method for creating three-dimensional, two- channel binaural audio from stereo and multichannel sound sources by converting multichannel audio signals to three-dimensional surround sound may include electronically dividing and multiplying (i.e. replicating) a plurality of at least four individual input channels of incoming audio. The at least four individual input channels may be a front left channel, front right channel, left rear channel and right rear channel. The method may also include assigning the channels to a dimensional spatial assignment matrix that may relate to the physical placement of a virtual sound source in space, situated within a three-dimensional space, surrounding a generally central listening point and relating to corresponding simulated loudspeaker locations arranged in a similar way around a central listening point. The method may further include selecting for processing some or all of the channel outputs to enhance individual directional spatial properties, either individually or in groups, prior to the collective output being summarily processed by a single or group of HRTF audio processors. The single or group of HRTF audio processors may convert incoming audio to multidimensional audio whose immersive stereo output may be monitored via at least two equidistant loudspeakers or headphones. The channels may be individually adjusted for gain in relation to each other. In most instances the virtual channels are adjusted to be louder than the originally input channel before being input into the final HRTF processors. However, this level control may be program dependent, user preferred or otherwise adjustable.
[0055] According to an embodiment, the front left channel and the front right channel may be summed to form an individual front center channel. The front center channel may be derived from generally equal amounts of the front left channel and front right channel. The rear left channel and rear right channel may be summed together to form an individual rear center channel. The rear center channel may be derived from generally equal amounts of the rear left channel and the rear right channel.
[0056] According to a further embodiment, the method may comprise a plurality of at least six individual input channels of incoming audio. The at least six individual input channels may be a front left channel, front right channel, left rear channel, right rear channel, front center channel and LFE channel.
[0057] Multichannel signals may be run through a process previously mixed or created for immersive three-dimensional audio delivery in four, five or more channels of sound and initially intended to be heard in a theatre system or a home theatre system, such as a DVD containing 5.1 channels of surround sound or a broadcast delivered in HDTV format containing 5.1 or 6.1 channel surround sound audio delivery means. This process may generally replace and work in substitute of the stereo input mode.
[0058] Individual incoming multichannels of audio may be initially assigned as labeled, left to left, right to right, left rear to left rear, etc. Groups or individuals whose position may equate to surrounding positions of loudspeakers in a multichannel speaker system are designed for surround sound playback audio, such as a home theater. Individual channel assignments may be processed individually with the stereo input mode where they may be processed to achieve a desired immersive audio localization effect. [0059] According to an embodiment, the left rear channel and right rear channel signal may be untouched on input and may remain as the original incoming left rear signal and right rear signal. The signals may be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1 kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further. The left rear channel and right rear channel may run though a spatializer or other stereo enhancement to add clarity. The left rear channel and right rear channel may be processed together with an effect level that may be variable individually or as a group. The left rear channel and right rear channel may join remaining channels by either selectively being processed with a room simulation or reverberation processor, or sent directly to the HRTF processor set and output.
[0060] If the front center channel is included in the incoming multichannel audio, a center channel may be assigned to a center channel and may contain the same information as contained within the original incoming material. If a center channel is not originally input, such as in a Dolby surround or four-channel surround mix, a channel assigned to the front center channel may be initially derived from a sum of the original left signal and original right signal. This combined monophonic signal may then be processed so that the frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further. The channel gain may be set slightly higher in relation the left front channel and right front channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set.
[0061] According to an embodiment, the channel assigned to the rear center channel may be initially derived from a sum of the original left rear signal and right rear signal. The summed monophonic signal may then be processed such that a frequency range of 1 kHz to 4 kHz may be audibly enhanced or made louder in relation to the rest of the frequency range. If 2 kHz is pushed higher than the 1kHz and 4 kHz resulting in an upward bell curve, the effect may be enhanced further. The center rear channel may then run though a dynamic equalizer or other high frequency enhancement processor to add clarity and separation in the rear. The signal may be electronically delayed by 6-10 milliseconds before being output. The delayed signal may be assigned spatially to the center rear of the three-dimensional space surrounding a listening point. This channel gain may be set slightly higher or lower in relation the left rear channel and right rear channel gain before being sent to join remaining channels together by either selectively being processed with global room simulation or reverberation processors, such as ambience processors, or sent directly to the HRTF processor set. For a top channel, the signal may be electronically delayed by 4-8 milliseconds before being output.
[0062] If an original LFE channel is included in the incoming multichannel audio, an LFE channel may be assigned to an LFE channel and may contain the same information as contained within the original incoming material. If an LFE channel is not included in the incoming audio, a channel assigned to the LFE channel may be initially derived from a sum of the original left signal and original right signal. The summed monophonic signal may be processed such that a frequency range of the signal may enhance the frequencies between 0 Hz and 200 Hz and reduce frequencies above 200 Hz. The signal may be assigned spatially to an LFE central position of the three- dimensional space surrounding a listening point. If remaining desirable frequencies are completely diminished and inaudible, and remaining audible frequencies are given increased amplitude by greater than 3 dB, the bass effect may be made more pronounced. Care may be exercised not to overdrive other stages with the channel level as the LFE may overdrive the HRTF processor. The channel gain may be set in relation the other channels before being sent directly to the HRTF processor set.
[0063] The original left front channel and original right front channel may be processed initially via an additional independent stereo enhancement processor. Original stereo tracks may provide a backbone to an overall mix until inputted to a final global processor with all newly created channels. By running all channels, except the LFE channel, through a final global ambience processor, a cohesion, unification and sense of believability may be output to the listener.
[0064] There may be a certain amount of variability for the specific gain levels sent to the processors and final HRTF processors. An "incoming material-dependent" situation may arise where amounts of gain applied in different stages change depending on audio material type supplied. For example, classical music captured in a theatre may require a different approach or processor type compared to ambient material provided within a cinematic film. Further, live rock music may require a different processor and ambience amount sent to the total output than jazz music. There may be variability in an amount of initial spatial panning assignment when the channels are originally separated to increase believability and to avoid inter-channel phase problems, before the signals reach the processors.
[0065] After the signals are sent individually and before the signals re-combine into stereo, there may be a possibility of applying a global effect.
[0066] Using high quality audio pre-amplifiers with clean sonic signal to boost the signal at the front-end of the process while acquiring incoming audio may help "opening up" sound for splitting. Additionally utilizing the benefits of tube or valve pre-amplification, whether it is analog electronics or simulated digitally, with its subtly added harmonics may further separate the incoming material into enhanced frequency divisions. Some audio level compression used on the incoming audio also helps to smooth out the entire process. Some gain makeup at the end of the process, including additional compression and equalization (EQ), may be used to make up for any frequency or amplitude deficiencies added by the total process and to smooth out any acquired amplitude transients that may have been added by additionally selected processors. Compression and EQ may be used to make the total three-dimensional mix sound more pleasing to the listener's ear or to the intended listening market.
[0067] Signal processors may enhance specific frequency bands or audible delays. Post-processor, affected signals may have an audible difference to an original signal once passed through the processor. At least 3 dB and greater of signal boost in the described frequency range on certain channels may be required to hear a difference on affected channels. An amount greater than 3 dB may be variable. Processors known commonly in the art that may be used and that may substitute each other if slightly different effects are desired include the following. An audio equalizer is a processor for adjusting the balance between frequency components within an audible electronic signal. A harmonic generator may be an audio signal processing technique used to enhance a signal by dynamic equalization, phase manipulation, harmonic synthesis of high frequency signals, and through adding subtle harmonic distortion. A harmonic generator may be further used to synthesize harmonics of low frequency signals to simulate deep bass in smaller speakers. Harmonic synthesis may involve creating higher order harmonics from fundamental frequency signals present in a recording. An "exciter" processor may generate high frequency components that may not be part of an original signal by employing a non-linear distortion process resembling overdrive and distortion effects. The "exciter" processor may pass an input signal through a high-pass filter before feeding the input signal into the harmonics (distortion) generator, which may result in artificial harmonics being added to the original signal. The artificial harmonics added to the original signal may contain frequencies at least one octave above a threshold of the high-pass filter. A distorted signal may be mixed with the original signal. Finally, a stereophonic image enhancer may be used by a system and apparatus disclosed in U.S. Patent No. 5,412,731, entitled "AUTOMATIC STEREOPHONIC MANIPULATION SYSTEM AND APPARATUS FOR IMAGE ENHANCEMENT," published on May 2, 1995, wherein the spatializer technology may manipulate the original signal for the listener to perceive a stereo image beyond boundaries of two loudspeakers and place sound in front of the listener in an arc of 180 degrees. As humans hear in 360 degrees and in a three-dimensional space, 180 degrees of audio may not be sufficient to supply immersive audio. However, stereophonic image enhancement may be used for some specific channel sets to achieve a heightened sense of depth and space. For example, stereophonic image enhancement on rear channels may add immeasurably to a perceive space of a program.
[0068] HRTF processors are described in U.S. Patent No. 6,980,661, entitled "METHOD OF AND APPARATUS FOR PRODUCING APPARENT MULTIDIMENSIONAL SOUNDS," published on December 27, 2005. HRTF processors may take incoming audio and create a three-dimensional effect over headphones. HRTF processors may be commercially available from companies such as Dolby Laboratories, QSound, DTS, and Zoran Corporation. HRTF processors may focus on sound quality first, with adjustability and variability from source to source within the process, which may make the process more flexible and program-dependent as certain types of incoming music may work best with certain settings.
[0069] Targeting specific frequency bands that may relate to the localization function in a human nervous system may stimulate the human vestibular and localization systems. Resulting media may playback on any type of media player without degradation of the original signal or total signal loss when played through certain loudspeaker configurations via phase cancellation. Adding virtual height and center rear channels to the three-dimensional soundscape may result in an improvement over an audio signal that was encoded via MP3.
[0070] Any previously recorded stereo, mono or multichannel sound information may be used as a source. The sound information may be processed for real-time playback via binaural headphones or other stereo listening or broadcast or recorded for future playback in enhanced three- dimensional binaural audio. The sound information may be processed to provide a three- dimensional feeling of personal multichannel surround sound while maintaining integrity of the original material. Resulting audio delivery may be of a high quality and may provide HRTF audio cues within an audio program required for the listener to internally process the audio and its location in a three-dimensional space. The processed audio may be experienced by the listener over headphones or two equidistant speakers as situated in a triangle with reference to the listener, such as stereo speakers on a laptop computer.
[0071] The present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Certain adaptations and modifications of the invention will be obvious to those skilled in the art. Therefore, the presently discussed embodiments are considered to be illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than the foregoing description and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

s claimed is:
A method for creating three-dimensional binaural audio from audio signals on a left stereo sound channel and a right stereo sound channel, the method comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the left stereo sound channel or the right stereo sound channel of incoming audio or by combining replicated audio signals of the left stereo sound channel and the right stereo sound channel of incoming audio; assigning each of the left stereo sound channel, right stereo sound channel, and the virtual channels to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the left stereo sound channel, the right stereo sound channel, and the virtual channels through one or more head- related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio; and outputting the two or more audio output signals for recording, distribution, or output as sound.
The method of claim 1, wherein the one or more sound-emitting devices are headphones.
3. The method of claim 1, wherein the one or more sound-emitting devices are loudspeakers.
4. The method of claim 3, wherein the loudspeakers are two equidistant loudspeakers.
5. The method of claim 3, wherein the loudspeakers comprise at least five equidistant loudspeakers.
6. The method of claim 1, wherein the virtual channels comprise a front center channel, left rear channel and right rear channel.
7. The method of claim 6, wherein the virtual channels further comprise a center rear channel.
8. The method of claim 6, wherein the virtual channels further comprise a low frequency effects (LFE) channel.
9. The method of claim 6, wherein the virtual channels further comprise a top channel.
10. The method of claim 1, wherein the channels are individually adjusted for gain in relation to each other.
11. The method of claim 1, wherein processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix comprises processing to enhance frequencies between 1 kHz and 5 kHz.
12. The method of claim 8, wherein processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix comprises, for the LFE channel, processing to enhance frequencies between 0 Hz and 200 Hz and to reduce frequencies above 200 Hz.
13. The method of claim 7, wherein processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix, for the center rear channel, comprises processing to delay the audio signal by 6 to 10 milliseconds.
14. The method of claim 9, processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix, for the top channel, comprises processing to delay the audio signal by 4 to 9 milliseconds.
15. The method of claim 1, further comprising processing one or more of the audio signals on the left stereo sound channel, right stereo sound channel, and the virtual channels for one or more of the following before the audio signals are processed by the one or more HRTF processors: ambience, reverberation or room simulation, gain, and level.
16. The method of claim 1, wherein assigning each of the left stereo sound channel, right stereo sound channel, and the virtual channels to a position in a three-dimensional spatial assignment matrix includes assigning a virtual channel replicating one or both of the left stereo sound channel and the right stereo sound channel to a position in the three- dimensional spatial assignment matrix in place of the original left stereo sound channel or right stereo sound channel or both.
17. The method of claim 1, further comprising adjusting the gain on the left stereo sound channel, the right stereo sound channel, and the virtual channels in relation to each other.
18. A method for creating three-dimensional binaural audio from audio signals on three or more audio channels, the method comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the three or more audio channels of incoming audio or by combining the audio signals on two or more of the three or more audio channels of incoming audio; assigning each of the three or more audio channels and the virtual channels to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the three or more audio channels and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three- dimensional binaural audio; and outputting the two or more audio output signals for recording, distribution, or output as sound.
19. The method of claim 18, wherein the three or more audio channels comprise a front left channel, front right channel, left rear channel and right rear channel.
20. The method of claim 19, wherein the front left channel and the front right channel are combined to form a front center virtual channel.
21. The method of claim 19, wherein the rear left channel and rear right channel are combined to form a rear center virtual channel.
22. The method of claim 18, wherein assigning each of the three or more audio channels and the virtual channels to a position in a three-dimensional spatial assignment matrix includes assigning a virtual channel replicating one or more of the three or more audio channels to a position in the three-dimensional spatial assignment matrix in place of the original one or more of the three or more audio channels.
23. A system for creating three-dimensional binaural audio from audio signals on two or more audio channels, the system comprising: a signal multiplier for creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on one of the two or more audio channels of incoming audio or by combining the replicated audio signals on two or more of the two or more audio channels of incoming audio using a signal combiner; wherein each of the two or more audio channels and the virtual channels is assigned to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three-dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; one or more audio processors for processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; one or more head-related transfer function (HRTF) audio processors for processing one or more of the audio signals on the two or more audio channels and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three-dimensional binaural audio; and one or more sound-emitting devices for receiving and emitting the two or more audio output signals as sound or an audio recording device for receiving and recording the two or more audio output signals for future playback or an audio distribution device for distributing the audio output signals.
24. A method for creating three-dimensional binaural audio from an audio signal on a mono audio channel, the method comprising: creating one or more virtual channels, wherein for each virtual channel the virtual channel is created by replicating the audio signal on the mono audio channel of incoming audio; assigning each of the mono audio channel and the virtual channels to a position in a three-dimensional spatial assignment matrix, wherein the position assigned to each channel corresponds to the position of a virtual sound source in a three- dimensional space surrounding a listening point and each position has an associated direction that is facing the listening point; processing the audio signal on one or more of the virtual channels to enhance their individual directional and spatial properties based on their assigned position and associated direction in the three-dimensional spatial assignment matrix; processing one or more of the audio signals on the mono audio channel and the virtual channels through one or more head-related transfer function (HRTF) processors to result into two or more audio output signals comprising three- dimensional binaural audio; and outputting the two or more audio output signals for recording, distribution, or output as sound.
PCT/CA2017/050384 2016-03-29 2017-03-29 A system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources WO2017165968A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662314540P 2016-03-29 2016-03-29
US62/314,540 2016-03-29

Publications (1)

Publication Number Publication Date
WO2017165968A1 true WO2017165968A1 (en) 2017-10-05

Family

ID=59962358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2017/050384 WO2017165968A1 (en) 2016-03-29 2017-03-29 A system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources

Country Status (1)

Country Link
WO (1) WO2017165968A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108156575A (en) * 2017-12-26 2018-06-12 广州酷狗计算机科技有限公司 Processing method, device and the terminal of audio signal
CN108156561A (en) * 2017-12-26 2018-06-12 广州酷狗计算机科技有限公司 Processing method, device and the terminal of audio signal
US10964300B2 (en) 2017-11-21 2021-03-30 Guangzhou Kugou Computer Technology Co., Ltd. Audio signal processing method and apparatus, and storage medium thereof
US11246001B2 (en) 2020-04-23 2022-02-08 Thx Ltd. Acoustic crosstalk cancellation and virtual speakers techniques
US11315582B2 (en) 2018-09-10 2022-04-26 Guangzhou Kugou Computer Technology Co., Ltd. Method for recovering audio signals, terminal and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7889870B2 (en) * 2006-01-10 2011-02-15 Samsung Electronics Co., Ltd Method and apparatus to simulate 2-channel virtualized sound for multi-channel sound
US20140064526A1 (en) * 2010-11-15 2014-03-06 The Regents Of The University Of California Method for controlling a speaker array to provide spatialized, localized, and binaural virtual surround sound
US20140355795A1 (en) * 2013-05-29 2014-12-04 Qualcomm Incorporated Filtering with binaural room impulse responses with content analysis and weighting
WO2015010937A2 (en) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Renderer controlled spatial upmix
US9154896B2 (en) * 2010-12-22 2015-10-06 Genaudio, Inc. Audio spatialization and environment simulation
US20160029139A1 (en) * 2013-04-19 2016-01-28 Electronics And Techcommunications Research Institute Apparatus and method for processing multi-channel audio signal
US9258664B2 (en) * 2013-05-23 2016-02-09 Comhear, Inc. Headphone audio enhancement system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7889870B2 (en) * 2006-01-10 2011-02-15 Samsung Electronics Co., Ltd Method and apparatus to simulate 2-channel virtualized sound for multi-channel sound
US20140064526A1 (en) * 2010-11-15 2014-03-06 The Regents Of The University Of California Method for controlling a speaker array to provide spatialized, localized, and binaural virtual surround sound
US9154896B2 (en) * 2010-12-22 2015-10-06 Genaudio, Inc. Audio spatialization and environment simulation
US20160029139A1 (en) * 2013-04-19 2016-01-28 Electronics And Techcommunications Research Institute Apparatus and method for processing multi-channel audio signal
US9258664B2 (en) * 2013-05-23 2016-02-09 Comhear, Inc. Headphone audio enhancement system
US20140355795A1 (en) * 2013-05-29 2014-12-04 Qualcomm Incorporated Filtering with binaural room impulse responses with content analysis and weighting
WO2015010937A2 (en) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Renderer controlled spatial upmix

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10964300B2 (en) 2017-11-21 2021-03-30 Guangzhou Kugou Computer Technology Co., Ltd. Audio signal processing method and apparatus, and storage medium thereof
CN108156575A (en) * 2017-12-26 2018-06-12 广州酷狗计算机科技有限公司 Processing method, device and the terminal of audio signal
CN108156561A (en) * 2017-12-26 2018-06-12 广州酷狗计算机科技有限公司 Processing method, device and the terminal of audio signal
CN108156575B (en) * 2017-12-26 2019-09-27 广州酷狗计算机科技有限公司 Processing method, device and the terminal of audio signal
EP3618461A4 (en) * 2017-12-26 2020-08-26 Guangzhou Kugou Computer Technology Co., Ltd. Audio signal processing method and apparatus, terminal and storage medium
US10924877B2 (en) 2017-12-26 2021-02-16 Guangzhou Kugou Computer Technology Co., Ltd Audio signal processing method, terminal and storage medium thereof
US11039261B2 (en) 2017-12-26 2021-06-15 Guangzhou Kugou Computer Technology Co., Ltd. Audio signal processing method, terminal and storage medium thereof
US11315582B2 (en) 2018-09-10 2022-04-26 Guangzhou Kugou Computer Technology Co., Ltd. Method for recovering audio signals, terminal and storage medium
US11246001B2 (en) 2020-04-23 2022-02-08 Thx Ltd. Acoustic crosstalk cancellation and virtual speakers techniques

Similar Documents

Publication Publication Date Title
JP4505058B2 (en) Multi-channel audio emphasis system for use in recording and playback and method of providing the same
Theile Multichannel natural music recording based on psychoacoustic principles
TWI489887B (en) Virtual audio processing for loudspeaker or headphone playback
JP5323210B2 (en) Sound reproduction apparatus and sound reproduction method
WO2017165968A1 (en) A system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources
US20060165247A1 (en) Ambient and direct surround sound system
WO2013181115A1 (en) Audio depth dynamic range enhancement
US20140185812A1 (en) Method for Generating a Surround Audio Signal From a Mono/Stereo Audio Signal
WO2002015637A1 (en) Method and system for recording and reproduction of binaural sound
CN1091889A (en) Be used for acoustic image enhanced stereo sound control device and method
Lee 2D to 3D ambience upmixing based on perceptual band allocation
JPWO2010131431A1 (en) Sound playback device
US10321252B2 (en) Transaural synthesis method for sound spatialization
JP5237463B2 (en) Apparatus for generating a multi-channel audio signal
US20180262859A1 (en) Method for sound reproduction in reflection environments, in particular in listening rooms
JP2020518159A (en) Stereo expansion with psychoacoustic grouping phenomenon
US20240056735A1 (en) Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same
WO2024081957A1 (en) Binaural externalization processing
TWI262738B (en) Expansion method of multi-channel panoramic audio effect
Benicek Methods and Techniques for Capturing Music Concerts for Virtual Reality Experiences
AU751831C (en) Method and system for recording and reproduction of binaural sound
KR20130063906A (en) Audio system and method for controlling the same
Theile Mikrofon-und Mischkonzepte für 5.1 Mehrkanal-Musikaufnahmen Microphone and mixing concepts for 5.1 music recordings
JP2014045479A (en) Acoustic processing device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17772903

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17772903

Country of ref document: EP

Kind code of ref document: A1