GB2565747A - Enhancing loudspeaker playback using a spatial extent processed audio signal - Google Patents

Enhancing loudspeaker playback using a spatial extent processed audio signal Download PDF

Info

Publication number
GB2565747A
GB2565747A GB1706288.6A GB201706288A GB2565747A GB 2565747 A GB2565747 A GB 2565747A GB 201706288 A GB201706288 A GB 201706288A GB 2565747 A GB2565747 A GB 2565747A
Authority
GB
United Kingdom
Prior art keywords
channel
channel audio
audio signal
audio signals
spatially extended
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1706288.6A
Other versions
GB201706288D0 (en
Inventor
Johannes Eronen Antii
Artturi Leppänen Jussi
Johannes Pihlajakuja Tapani
Juhani Lehtiniemi Arto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to GB1706288.6A priority Critical patent/GB2565747A/en
Publication of GB201706288D0 publication Critical patent/GB201706288D0/en
Priority to PCT/FI2018/050277 priority patent/WO2018193163A1/en
Priority to EP18787484.7A priority patent/EP3613221A4/en
Publication of GB2565747A publication Critical patent/GB2565747A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/07Synergistic effects of band splitting and sub-band processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones

Abstract

An apparatus for generating at least two channel audio signals, each channel associated with a channel position within a sound scene comprising at least one sound source 211. The apparatus is configured to receive and/or determine at least one channel audio signal associated with the at least one sound source before generating at least one spatially extended audio signal 321 based on the at least one channel audio signal. The at least one channel audio signal and the at least one spatially extended channel audio signal are then combined to generate at least two output channel audio signals. Preferably, the apparatus receives two neighboring channel audio signals and analyses them to determine a cross-channel movement parameter. Preferably, at least two spatially extended audio signals are generated based on the cross-channel movement parameter. The cross channel movement parameter can be determined based on analysis of level changes of a joint audio component within the two neighboring channel audio signals. The joint audio component being determined from a frequency band of the two neighboring channels which has a correlation greater than a certain value.

Description

ENHANCING LOUDSPEAKER PLAYBACK USING A SPATIAL EXTENT
PROCESSED AUDIO SIGNAL
Field
The present application relates to apparatus and methods for enhancing loudspeaker playback using a spatial extent processed audio signal.
Background
Capture of audio signals from multiple sources and mixing of audio signals when these sources are moving in the spatial field requires significant effort. For example the capture and mixing of an audio signal source such as a speaker or artist within an audio environment such as a theatre or lecture hall to be presented to a listener and produce an effective audio atmosphere requires significant investment in equipment and training.
A commonly implemented system is where one or more ‘external’ microphones, for example a Lavalier microphone worn by the user or an audio channel associated with an instrument, is mixed with a suitable spatial (or environmental or audio field) audio signal such that the produced sound comes from an intended direction. This system is known In some areas as Spatial Audio Mixing (SAM).
The SAM system enables the creation of immersive sound scenes comprising “background spatial audio” or ambiance and sound objects for Virtual Reality (VR) applications. Often, the scene can be designed such that the overall spatial audio of the scene, such as a concert venue, is captured with a microphone array (such as one contained in the OZO virtual camera) and the most important sources captured using the ‘external’ microphones.
The term spatial extent or spatial spread refers to the degree of localization associated with a sound object. The sound object is point-like when its spatial extent is at minimum. With a larger spatial extent, the sound is perceived from more than one direction simultaneously.
A common method to playback sounds using loudspeakers is to use amplitude panning. In amplitude panning, a ‘sound object’ is positioned between a loudspeaker pair (or inside a loudspeaker triplet) by mixing it to several loudspeakers simultaneously using suitable gain parameters. As a result, humans perceive a virtual audio object between the loudspeakers (or in the middle of a loudspeaker triplet). When a sound position exactly coincides with a position of a loudspeaker, the sound is played only from that loudspeaker.
A known limitation of loudspeaker sound reproduction with amplitude panning is that the perceived spatial extent of a sound object may vary depending on the number of loudspeakers which currently play back the sound object. Depending on the panning direction, the number of active loudspeakers varies, it has been observed that undesired effect such as changing spatial spread and spectral coloration may occur because of this.
Furthermore, when rendering loudspeaker format signals especially as virtual loudspeaker signals with headphones, some users tend to perceive the virtual speaker locations. When a sound moves around the listener, the listener can actually hear the sound signal gain lower in one virtual speaker and increase in another one. However, instead it would be desirable to make the “loudspeakers disappear” and thus create a uniform sound stage around the listener where only the sound sources are perceived, not the loudspeakers. This is because the end goal is to create an immersive sound scene around the user, as if they were surrounded by the intended sonic environment. This illusion will break if the user perceives speaker locations, as they will feel as if they are being situated in the middle of loudspeakers instead of being virtually transferred to another the intended, created sonic environment.
Summary
There is provided according to a first aspect an apparatus for generating at least two channel audio signals, each channel associated with a channel position within a sound scene comprising at least one sound source, the apparatus configured to: receive and/or determine at least one channel audio signal associated with the at least one sound source: generate at least one spatially extended audio signal based on the at least one channel audio signal: combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
The apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source may be configured to receive at least two neighbouring channel audio signals.
The apparatus may be further configured to analyse the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein the apparatus configured to generate the at least one spatially extended audio signal may be configured to apply a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis may be controlled based on the at least one cross-channel movement parameter.
The apparatus configured to combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may be configured to combine: a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
The apparatus configured to analyse the at least one channel audio signal to determine at least one cross-channel movement parameter may be further configured to: determine at least one joint audio component within the at least two neighbouring channel audio signals; and determine a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
The apparatus configured to determine at least one joint audio component within the at least two neighbouring channel audio signals may be further configured to determine at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
The apparatus configured to determine a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals may be configured to determine a cross-channel movement parameter based on determining at least one of: a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
The apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source within the sound scene may be configured to: receive a sound source based audio signal; generate the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
The apparatus may be further configured to analyse position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
The apparatus configured to generate the at least one spatially extended audio signal may be configured to apply a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis may be controlled based on the at least one cross-channel movement parameter.
The apparatus configured to analyse position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter may be configured to determine a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
The apparatus configured to combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may be configured to: determine at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter;
combine the processed at ieast one channel audio signa! and the at ieast one spatially extended channel audio signal to generate the at least one output channel audio signal.
The apparatus configured to generate at least one spatially extended audio signal based on the at least one channel audio signal may be configured to apply at least one of: a vector base amplitude panning to the at least one audio signal; direct binaural panning; direct assignment to channel output location; synthesized ambisonics; and wavefield synthesis.
The apparatus configured to apply a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal may be configured to: determine a spatially extending parameter; and determine at least one position associated with the at least one channel audio signal; determine at least one frequency band position based on the at least one position and the spatial extent parameter; and generate panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
The apparatus may be further configured to determine a position of the at least one channel relative to the apparatus.
The spatially extending synthesis vector base amplitude panning may be configured to be controlled such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
The at least two channel audio signals may be one of: loudspeaker channel audio signals: and virtual loudspeaker channel audio signals.
The apparatus configured to generate at least one spatially extended audio signal based on the at least one channel audio signal may be configured to generate at least one of: at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
The apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source may be configured to receive at least two channel audio signals, wherein the apparatus may be configured to: selectively generate the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and combine the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
The apparatus configured to receive at least one channel audio signal may be further configured to receive at least two audio signals, wherein at least one of the at least two audio signals may be a channel based audio signal and at least one of the at least two audio signals may be an object-based from which further channel based audio signals are determined.
According to a second aspect there is provided a method for generating at least two channel audio signals, each channel associated with a channel position within a sound scene comprising at least one sound source, the method comprising: receiving and/or determining at least one channel audio signal associated with the at least one sound source; generating at least one spatially extended audio signal based on the at least one channel audio signal; combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
Receiving and/or determining at least one channel audio signal associated with the at least one sound source may comprise receiving at least two neighbouring channel audio signals.
The method may comprise analysing the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein generating the at least one spatially extended audio signal may comprise applying a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis may be controlled based on the at least one cross-channel movement parameter.
Combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals comprises may comprise: a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signa! based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
Analysing the at least one channel audio signal to determine at least one cross-channel movement parameter may comprise: determining at least one joint audio component within the at least two neighbouring channel audio signals; and determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
Determining at least one joint audio component within the at least two neighbouring channel audio signals may comprise determining at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
Determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals may comprise determining a cross-channel movement parameter based on determining at least one of: a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
Receiving and/or determining at least one channel audio signal associated with the at least one sound source within the sound scene may comprise: receiving an sound source based audio signal; generating the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
The method may comprise analysing position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
Generating the at least one spatially extended audio signa! may comprise applying a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis is controlled based on the at least one cross-channei movement parameter.
Analysing position data associated with the sound source based audio signal to determine the at ieast one cross channei movement parameter may comprise determining a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
Combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may comprise: determining at least one weighting value for applying to at ieast one of the at legist one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter; combining the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
Generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise applying at least one of: a vector base amplitude panning to the at least one audio signal; direct binaural panning; direct assignment to channel output location; synthesized ambisonics; and wavefield synthesis.
Applying a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal may comprise: determining a spatially extending parameter; determining at least one position associated with the at least one channel audio signal; determining at least one frequency band position based on the at least one position and the spatial extent parameter; and generating panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
The method may further comprise determining a position of the at least one channel relative to the apparatus.
Controlling the spatially extending synthesis vector base amplitude panning may be such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
The at least two channel audio signals may be one of: loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
Generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise generating at least one of: at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
Receiving and/or determining at least one channel audio signal associated with the at least one sound source may comprise receiving at least two channel audio signals, wherein the method may further comprise: selectively generating the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and combining the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
Receiving at least one channel audio signal may further comprise receiving at least two audio signals, wherein at least one of the at least two audio signals may be a channel based audio signal and at least one of the at least two audio signals may be an object-based from which further channel based audio signals are determined.
According to a third aspect there is provided an apparatus for generating at least tw'o channel audio signals, each channel associated with a channel position within a sound scene comprising at least one sound source, the apparatus comprising: means for receiving and/or determining at least one channel audio signal associated with the at least one sound source; means for generating at least one spatially extended audio signal based on the at least one channel audio signal; means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
Means for receiving and/or determining at least one channel audio signa! associated with the at least one sound source may comprise means for receiving at least two neighbouring channel audio signals.
The apparatus may further comprise means for analysing the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein means for generating the at least one spatially extended audio signal may comprise means for applying a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis may be controlled based on the at least one cross-channel movement parameter.
The means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may comprise means for combining: a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
The means for analysing the at least one channel audio signal to determine at least one cross-channel movement parameter may further comprise: means for determining at least one joint audio component within the at least two neighbouring channel audio signals; and means for determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
The means for determining at least one joint audio component within the at least two neighbouring channel audio signals may comprise means for determining at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
The means for determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals may comprise means for determining a cross-channel movement parameter based on determining at least one of: a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
The means for receiving and/or determining at least one channel audio signal associated with the at least one sound source within the sound scene may comprise: means for receiving an sound source based audio signal: means for generating the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
The apparatus may further comprise means for analysing position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
The means for generating the at least one spatially extended audio signal may comprise means for applying a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis may controlled based on the at least one cross-channel movement parameter.
The means for analysing position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter may comprise means for determining a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
The means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may comprise: means for determining at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter: means for combining the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
Generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise means for applying at least one of: a vector base amplitude panning to the at least one audio signal; direct binaural panning; direct assignment to channel output location; synthesized ambisonics; and wavefield synthesis.
The means for applying a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal may comprise: means for determining a spatially extending parameter; means for determining at least one position associated with the at least one channel audio signal; means for determining at least one frequency band position based on the at least one position and the spatial extent parameter; and means for generating panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
The apparatus may further comprise means for determining a position of the at least one channel relative to the apparatus.
The means for controlling the spatially extending synthesis vector base amplitude panning may be such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
The at least two channel audio signals may be one of: loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
The means for generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise means for generating at least one of: at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
The means for receiving and/or determining at least one channel audio signal associated with the at least one sound source may comprise means for receiving at least two channel audio signals, wherein the apparatus may comprise: means for selectively generating the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and means for combining the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
The means for receiving at least one channel audio signal may further comprise means for receiving at least two audio signals, wherein at least one of the at least two audio signals may be a channel based audio signal and at least one of the at least two audio signals may be an object-based from which further channel based audio signals are determined.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An eiectronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
Summary of the Figures
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
Figure 1 shows schematically an example loudspeaker playback system employing a spatially extended audio signal according to some embodiments;
Figure 2 shows schematically an example loudspeaker arrangement using amplitude panning of sound sources;
Figure 3 shows schematically an example loudspeaker arrangement using spatially extended audio signals as part of a panning of the sound sources according to some embodiments;
Figure 4 shows schematically the example loudspeaker playback system employing a spatially extended audio signal as shown in Figure 1 in further detail according to some embodiments;
Figure 5 shows a flow diagram of the operation of the example loudspeaker playback system shown in Figure 4 according to some embodiments;
Figure 6 shows schematically the example loudspeaker playback system shown in Figure 5 in further detail with respect to channel based input audio signals according to some embodiments:
Figure 7 shows schematically the spatial extent synthesizer shown in Figures above in further detail according to some embodiments; and
Figure 8 shows schematically an example device suitable for implementing the apparatus shown above according to some embodiments.
Embodiments of the Application
The following describes in further detail suitable apparatus and possible mechanisms for improving the quality of playback of moving sources over loudspeakers or other defined location output channels. In particular the embodiments described herein attempt to maintain a consistency of sound spatial spread and timbral quality. The embodiments described herein and the methods and apparatus apply to both physical loudspeakers and virtual loudspeaker playback over headphones.
As discussed above known limitations of audio playback over loudspeakers include non-uniform directional spread and spectral coloration depending on panning direction. Moreover, sometimes listeners may perceive the locations of loudspeakers.
The concept as presented in the embodiments hereafter is the spatially extending of the output channel or loudspeaker signals, and combining or summing the spatially extended audio signals to the normal (or Vector-Base-AmplitudePanning (VBAP)-rendered) loudspeaker audio signals. The spatially extending of the loudspeaker audio signals may be chosen so that together they cover the whole 360 degrees. The result of the proposed processing is that it makes the loudspeaker reproduction more spatially enveloping and makes perceiving loudspeaker locations less obvious. This leads to better Immersion as the listener has a smaller likelihood of perceiving the speaker locations but feels more fully immersed in the intended sound scene. Furthermore, the invention reduces changes in sound spectrum and spectral spread across different locations, without requiring any increase to the spectral spread of sounds. As the original loudspeakersignal is combined with the spatially extended signal, it is possible in some embodiments to maintain both a point-like perception of sounds when needed and still increase spatial envelopment and uniformity of sound reproduction across different spatial positions.
In some embodiments the spatially extending of the input channel or loudspeaker audio signal is selectively applied to some channels. For example in some embodiments the system spatially extends loudspeaker audio signal signals for all channels except the centre channel. This is because the centre channel conventionally is the speech channel and spatially extending the audio signals associated with the speech channel may produce artificially or unnaturally sounding sound scenes.
In some embodiments as discussed herein the input channels or input audio signals are analysed in order to determine whether spatially extending the audio signal is to be applied or whether the audio signal may be output without processing.
In some embodiments all of the channel (loudspeaker) audio signals are combined together and spatially extended or widened to a full 360 degrees. The spatially extended combined audio signals are then combined with the virtual loudspeaker or point sound source audio signals before rendering or outputting these.
In the following examples, audio signals and audio capture signals are described. However it would be appreciated that in some embodiments the apparatus may be part of any suitable electronic device or apparatus configured to capture an audio signal or receive the audio signals and other information signals.
With respect to figure 1 an example loudspeaker playback system employing a spatially extended audio signal mixing suitable for implementing embodiments is shown. The example shown in figure 1 is a channel based audio signal system, however as described hereafter the system in some embodiments be configured to receive object based audio signals.
A channel based audio signal is one wherein the sound scene is represented by one or more audio signals which represent the audio signals generated by playback equipment located in the listeners domain. As such as shown in figure 1, there is a series of input channels (or loudspeaker channels) each of which is associated with an audio signal and each of which has a defined position with respect to the listener. Thus for example a channel based audio signal may be a panned loudspeaker channel based system where the loudspeaker channels are in a 5.1 or other suitable channel format.
An object based audio signal is one wherein the sound scene is represented by one or more sound sources each of which have an audio signal and a defined position within the sound scene (and which may be mapped to a position with respect to the listener).
Thus for example figure 1 shows input channel 1 (loudspeaker 1) 101, input channel 2 (loudspeaker 2) 103, and input channel N (loudspeaker N) 105. The audio signals associated with the input channels can be passed to the spatially extending synthesiser 113 and to the mixer 111.
The system furthermore shows a spatially extending synthesizer 113. The spatial extending synthesiser 113 is configured to receive the input audio signals.
For example as shown in figure 1 the spatially extending synthesizer 113 is configured to receive the audio signals from the input channels (loudspeaker channels).
In some embodiments the spatial extending synthesiser 113 is further configured to receive audio signal positional information. This may in the channel 10 based example be the loudspeaker channel position information or may in the object based audio signal input example (as discussed in further detail later) may be positional information associated with the sound source represented by the audio signal.
In some embodiments the spatial extending synthesiser 113 is further 15 configured to receive a spatially extending control input. The spatially extending control input may be user input to assist in the operation of spatially extending the audio signals as discussed in further detail later.
The spatial extending synthesiser 113 may be configured to spatially extend the audio signals and output the spatially extended input audio signal to the mixer 20 111. The concept associated with embodiments described herein is that loudspeaker channels (or rendered channels) input to a spatially extending synthesizer spatially extends each channel to cover a certain area. For example, each loudspeaker channel in a 4.0 reproduction setup may be spatially extended to cover an area of 90 degrees around its own position.
In some embodiments the system comprises a mixer 111 or combiner. The mixer is configured to receive the input audio signals (shown in figure 1 as the loudspeaker input channel audio signals) and associated spatially extended audio signals from the spatially extending synthesizers 113. In some embodiments the combiner may be configured to combine the audio signals by selectively enabling 30 either the non-extended signal or the spatially extended signal. This may be seen as an OR operation applied to the audio signals. For example, when the sources are moving a lot a spatially extended version may be used, whereas when there is little movement then the non-spatiaily extended version (original version) may be used. Any suitable method of combing may be used.
The mixer 111 may furthermore be configured to receive a direct/extended control input (for example as shown in figure 1 also from the spatial extending synthesiser 113) configured to control the mix portions of the direct (or input channel) audio signal and the spatially extended audio signal·
The mixer 111 is in some embodiments configured to output each mixed audio signal to a suitable output. Thus for example the mixer 111 shown in figure 1 is shown mixing the input channel 1 101 audio signal with the spatially extended audio signal channels. Input Channel 1 contains the input for one loudspeaker channel. When it is fed to the Spatially Extending Synthesizer, it is split to N output Channels. Thus, the Mixer mixes the original N channel signals with N channel signals which contain the outputs of the Spatially Extended Signals. Thus, the spatially extended version of Channel 1 is carried in N output Channels of the Spatially Extending Synthesizer, not just Channel 1.
With respect to figures 2 and 3 the application of the concepts discussed and embodied by the system shown in figure 1 are shown. Figures 2 and 3 both show an example 5.0 virtual loudspeaker configuration however any suitable number of (virtual) loudspeakers and any suitable configuration or arrangement of the loudspeakers may be implemented. Similarly any suitable number of audio signals may be employed. For example the input can also be a monophonic channel, which is then mixed to a maximum of two output channels by amplitude panning methods. When the monophonic channel source position is exactly at a position of a loudspeaker, it is emitted from a single output channel only. When the signal is in between loudspeakers, it is mixed to two output channels. In 3D loudspeaker configurations (with elevated speakers) the input signal in some embodiments is mixed to a maximum of three output channels (loudspeakers).In other words the following example shows an input audio signal which is a channel based audio signal wherein there are 5 input channels and 5 output channels. The 5.0 loudspeaker system shown in figures 2 and 3 comprises a front right virtual loudspeaker channel 203, front centre virtual loudspeaker channel 209, front left virtual loudspeaker channel 205, a rear left virtual loudspeaker channel 207 and a rear right virtual loudspeaker channel 203. Furthermore with respect to the virtual loudspeakers Is shown a listener position 213. The listener position 213 is the position at which a user or listener of system is positioned relative to the virtual loudspeaker channels. In the example shown in figures 2 and 3 the user or listener is configured to be listening to the audio signals via a set of headphones. However it would be understood that this system may be implemented with physical loudspeakers located in the listener’s sound scene.
Furthermore, as shown in figures 2 and 3, there is a sound source 211 which is shown moving away from the front right virtual loudspeaker channel towards the rear right virtual loudspeaker channel.
The motion of the sound source is represented within figure 2, which shows an example whereby only direct audio signals (without any spatially extended audio signal components) are output and by the associated audio signal gain (or signal levels) from the front right virtual loudspeaker decreasing and the associated audio signal gain from the rear right virtual loudspeaker increasing. In such situations as discussed above the listener may become aware of the ‘loudspeakers’ and thus distract from the listening experience. In such situations when a sound is played from two loudspeakers simultaneously, the listener may perceive a virtual sound object between the loudspeakers. However, the timbre of the panned source is different depending on its position between the loudspeakers. It is brightest when exactly in one loudspeaker and dullest when exactly in between the speakers.
The concept as implemented by the system shown in figure 1 is shown in operation in figure 3, which shows loudspeaker channel audio signals being spatially widened or extended 321. In other words the embodiments enable playback not only the original sound but spatially extended versions of each loudspeaker channel. The energy is divided between the point source and the extended version of the audio signal.
With respect to figure 4 an example of a spatially extending synthesiser 113 according to some embodiments is shown. The spatially extending synthesiser 113 shown in figure 4 is one which is configured to be able to accept both channel based input audio signals and object based audio signals. In some embodiments the spatially extending synthesizer is configured to accept one or the other of the audio signal input formats and as such may only comprise the features or components required to process that audio signal input format.
In some embodiments the spatially extending synthesiser 113 comprises an object/channel based signal determiner 1401. The object/channel based signal determiner 1401 is configured to determine whether or not the input signals are channel based or object based. For example the audio signals shown in figure 1 are channel based. The object/channel based signal determiner 1401 may be configured to control the processing or outputting of the input audio signals based on the determination.
Thus for example where the audio signal is object based the sound source or object position information can decoded from the input and be passed directly to a cross channel analyser 1405 and to an object to channel renderer 1403. The object position information may also in some embodiments be represented with side information (or metadata or the like), for example, with (azimuth, elevation, distance, timestamp) which indicate the position of that sound object as polarcoordinates (or other co-ordinate systems) at a time indicated by timestamp.
In some embodiments where the audio signal is channel based, the audio signal can be passed to the cross-channel analyser 1405 and to a joint sound component determiner 1407.
In some embodiments of the spatially extending synthesiser 113 comprises an object to channel renderer 1403. The object to channel render 1403 is configured to receive the object or sound source based audio signals and render the audio signals to an output channel format suitable for spatially extending. Thus for example the renderer is configured to apply a spatial mapping of the audio signal based on the positional information of the sound source or object. The output channel rendered audio signals can then be passed to a spatially extending processor 1411. In some embodiments the channel renderer 1403 function is implemented within a spatially extending synthesizer configured to receive a monophonic input (which splits the signal to N output channels) rather than the example shown where the spatially extending synthesizer is configured to receive input loudspeaker channels.
In some embodiments the spatially extending synthesiser 113 comprises a joint sound component determiner 1407. The joint sound component determiner 1407 can be configured to receive the audio signals, which are channel based audio signals, and determine components of the audio signals which are common.
These determined joint sound components can be passed to the cross channel analyser 1405.
In some embodiments the spatially extending synthesiser 113 comprises a cross channel analyser 1405. The cross channel analyser 1405 can be configured to receive the audio signals and determine the amount of cross-channel movement. For example where the audio signals are channel based audio signals this may be determined by analysing the level changes of joint sound components between channels. Where the audio signals are object based audio signals then this may be determined by analysis of the sound source or object motion. The sound source/object position data may be represented as polar coordinates (azimuth, elevation, distance) or Cartesian coordinates (x, y, z), and a timestamp indicating the time to which the position corresponds to The analyser may analyse the position data to determine how much movement there is. For example, in some embodiments the analyser is configured to determine the azimuth range of sound object positions across a certain time interval. If the azimuth range of a sound object exceeds a predetermined threshold in degrees (e.g. 10 degrees), then it is possible to determine that there is movement. The larger the range then the more movement there is. When the amount of movement exceeds a predetermined threshold, the spatial extent processing for channel signals may be enabled. Moreover, the amount of movement may adjust the direct to extended ratio of the mixer: the more movement there is, the more gain is added to the spatially extended signal.
The cross-channel analyser 1405 can be configured to output the results of the analysis to a spatially extending channel controller 1409.
In some embodiments of the specially extending synthesiser 113 comprises a spatially extending channel controller 1409. The spatially extending channel controller 1409 is configured to receive the output of the cross channel analyser 1405 and determine whether or not the motion of the cross channel component is sufficient to require a spatial extending of the audio signal. Furthermore the controller in some embodiments is configured to determine specific spatially extending control signals to control the amount of spatially extending is to be applied by the spatially extending synthesiser/processor 1411 based on the movement of the cross channel component. Also the controller in some embodiments is configured to determine control signals to control the mixer and thus control the amount of spatially extending audio signal is to be combined with the audio signal within the mixer 111.
In some embodiments the spatial extending synthesiser 113 comprises a spatial extending synthesiser/processor 1411. The spatially extending synthesiser/processor 1411 is configured to receive the audio signal for spatially extending (for example from the object to channel renderer for an object based audio signal or from the input directly for a channel based audio signal) and furthermore control parameters for controlling the spatially extending from the spatially extending channel controller 1409. The spatially extending synthesiser/processor 1411 may thus spatially extend the audio signal based on the control parameters and output a spatially extended audio signal to the mixer 111.
With respect to figure 5 an example operation of the spatially extending synthesiser shown in figure 4 (and the mixer shown in figures 1 and 4) is shown by a flow diagram. As discussed above the synthesizer shown in figure 4 is one suitable for receiving both object based and channel based inputs. A similar but pruned flow diagram may be implemented for a synthesizer configured to receive only one of the audio input formats.
Firstly the input audio signals are determined to be either object or channel based as shown in figure 5 by step 501.
Where the input audio signals are determined to be channel based then the operation is configured to determine joint sound components between channels as shown in figure 5 by step 507.
Furthermore the amount of cross-channel movement may then be determined by analysing the level changes of joint sound components between channels as shown in figure 5 by step 509.
Where the input is determined to be object based then the amount of crosschannel movement may be determined by an analysis of the object or sound source position data as shown in figure 5 by step 505.
For both object and channel based audio signals the next step is to determine spatially extending control parameters for the channels based on the amount of cross channel movement as shown in figure 5 by step 511.
Where the audio signals are object based than the audio signal objects are rendered to channel audio signals as shown in figure 5 by step 503.
For both object (which are now rendered as channei based audio signals) and channel based input audio signals the spatial extending synthesis is applied to the channel based audio signals based on the control parameters as shown in figure 5 by step 515.
Then, for each channel, the original or rendered audio signals (the direct audio signals) and the spatially extended audio signals are combined or mixed based on the control signals as shown in figure 5 by step 516.
The mixed audio signal channels are then output as shown in figure 5 by step 517.
With respect to figure 6 further detail of the joint sound determiner 1407 and cross-channel analyser 1405 as implemented in some embodiments is shown (in other words where the input signal is a channel based signal such as shown in figure 1).
In some embodiments the joint sound component determiner comprises a time to frequency domain transformer such as a Short Time Fourier Transform (STFT). The time to frequency domain transformer 601 is configured to receive the input channei based audio signals and determine suitable frequency domain representations. The channel-based signal thus may be subjected to short-time discrete Fourier (STFT) analysis, using, for example, a temporal analysis window of 20 ms in length. The frequency domain representations can be passed to a subband filter 603.
In some embodiments of the joint sound component determiner 1407 further comprises a sub-band filter configured to receive the frequency representations of the input audio signals and generate sub-band groups of the frequency domain representations. The sub-band filter thus may be configured to determine 32frequency bands. The sub band filter may for example be configured to generate Equivalent rectangular bandwidth (ERB) determined frequency bands. Where the audio channels in the frequency domain are represented by Xk(n). The frequency domain representation may divided into B subbands %£(n) =Xk(nb+ n), n = 0,...,nb+1 -nh - 1, b = 0,...,F-l where nb is the first index of bth subband. n is discrete frequency and k is the channel index. The sub-band filter is configured to output these sub-band signals to a band-wise correlator 605.
In some embodiments the joint sound component determiner 1407 comprises a band-wise correlator 605. The band-wise correlator 605 may be configured to band wise correlate (neighbouring) channel audio signals to determine the level of correlation between these audio signals. The output of the band wise correlator 605 can be configured to be output to a joint sound component analyser 607. In some embodiments the joint sound component determiner comprises a joint sound component analyser 607. The joint sound component analyser 607 is configured to compare the band-wise correlation outputs to a determined threshold value to determine whether or not there are joint sound components within the audio signals with sufficient similarity which may be used to determine motion within the neighbouring channels.
In some embodiments the band-wise correlator is configured to find a delay rb that maximizes the correlation between two channels for sub-band b. This can be accomplished by creating time-shifted versions of the signal for a channel (e.g. in channel 2), and correlating these with the signal for another channel (e.g. on channel 3).
A time shift of τ time domain samples of X’£(n) can be obtained as ,2πητ
Xhb(n) = Xg(n)e~J~r.
Now the optimal delay rb is obtained from maxRe^X^Xl),Tb e [~Dmax,Dmax] where Re indicates the real part of the result and * denotes combined transpose and complex conjugate operations. X$Tb and are considered vectors with length of nb.·..i -- nb samples. The range of searching for the delay Dmax is selected such that it covers the expected time differences between loudspeaker channels depending on the setup.
Thus where the correlation exceed a predetermined value, the band b on channels 2 and 3 may be determined to contain the same content.
The results of the analysis in some embodiments is output to the cross channel analyser 1405 and a gain change determiner.
In some embodiments of the cross channel analyser 1405 comprises a gain change determiner 611. The gain change determiner may be configured to compare the joint sound components configured to determine the change of signal levels between the joint sound components in other words to determine whether or not the audio source is moving from one channel to another channel by analysing the signal levels of the joint sound components. In other words the movement of sound between loudspeakers channels can be determined by observing amplitude panning type behaviour of frequency bands in the channel input. Thus for example when band b was determined to contain similar content for channels 2 and 3, the system may continue to monitor the level (energy) of band b in channels 2 and 3 over a certain number of processing frames (for example, 10 frames). If during this time It is observed that the energy reduces on one channel and increases on the other, the content of that band is moving across the channels. The more frequency bands are moving simultaneously, the more movement there is in the sound scene.
With respect to figure 7 an example spatially extending synthesiser is shown in further detail. As described herein the spatially extending synthesiser receives the original or rendered channel audio signals and spatially extends the audio signals to a defined spatial extent based on the spatially extending control parameters such as those generated by the spatially extending channel controller. In other words the synthesizer takes as input a mono sound source audio signal and spatially extending parameters (width, height and depth).
In some embodiments where the audio signal input is a time domain signal the spatially extending synthesiser comprises a suitable time to frequency domain transformer. For example as shown in figure 7 the spatially extending synthesiser comprises a Short-Time Fourier Transform (STFT) configured to receive the audio signal and output a suitable frequency domain output. In some embodiments the input is a time-domain signal which is processed with hop-size of 512 samples. A processing frame of 1024 samples is used, and it is formed from the current 512 samples and previous 512 samples. The processing frame is zero-padded to twice its length (2048 samples) and Hann windowed. The Fourier transform is calculated from the windowed frame producing the Short-Time Fourier Transform (STFT) output. The STFT output is symmetric, thus it is sufficient to process the positive half of 1024 samples including the DC component, totalling 1025 samples. Although the STFT is shown in figure 7 any suitable time to frequency domain transform may be used.
In some embodiments the spatially extending synthesiser further comprises a filter bank 403. The filter bank 403 is configured to receive the output of the STFT 401 and using a set of filters generated based on a Halton sequence (and with some default parameters) generate a number of frequency bands 405. In statistics, Halton sequences are sequences used to generate points in space for numerical methods such as Monte Carlo simulations. Although these sequences are deterministic, they are of low discrepancy, that is, appear to be random for many purposes. In some embodiments the filter bank 409 comprises set of 9 different distribution filters, which are used to create 9 different frequency domain signals where the signals do not contain overlapping frequency components. These signals are denoted Band 1 F 405i to Band 9 F 405s in figure 7. The filtering can be implemented in the frequency domain by multiplying the STFT output with stored filter coefficients for each band.
In some embodiments the spatially extending synthesiser further comprises a spatially extending input 400. The spatially extending input 400 may be configured to define the spatially extending of the audio signal.
Furthermore in some embodiments the spatially extending synthesiser may further comprise an object/channel position input/determiner 402. The object/channel position input/determiner 402 may be configured to determine the spatial position of the sound sources.
In some embodiments the spatially extending synthesiser may further comprise a band position determiner 404. The band position determiner 404 may be configured to receive the outputs from the channel object position input/determiner 402 and the spatially extending input 400 and from these generate an output passed to the vector base amplitude panning processor 406.
In some embodiments the spatially extending synthesiser may further comprise a vector base amplitude panning (VBAP) processor 406. The VBAP 406 may be configured to generate control signals to control the panning of the frequency domain signals to desired spatial positions. Given the spatial position of the sound source (azimuth, elevation) and the desired spatially extending for the source (width in degrees), the system calculates a spatial position for each frequency domain signal. For example, if the spatial position of the sound source is zero degrees azimuth (front), and spatially extending 90 degrees, the VBAP may position the frequency bands at positions azimuth 45, 33.75, 22.5, 11.25, 0, 11.2500, -22.5000, -33.7500, -45 degrees. Thus, we use a linear allocation of bands around the source position, with the span defined by the spatial extent. In some other embodiments, nonlinear allocations or other arrangements of bands might be used. In some embodiments, the span of the bands might not exactly match with the desired spatial extent but may be smaller or larger. In particular, models of human sound perception may be used to compensate for the difference in perceived to synthesized spatial extent; in particular in cases where humans may perceive sound sources narrower than they actually are rendered. Spatial extent in elevation (height) domain may be performed in the same manner as for the azimuth (width) domain as above. Spatially extending in the depth domain may also be performed in some embodiments by rendering some bands at different depths using known methods for sound distance rendering. Methods for sound distance rendering include, for example, adding distance attenuation, sound with sounds further away being quieter. The distance attenuation may be implemented with the 1/r-rule where r is the distance. Alternatively or in addition to adding distance attenuation the distance rendering may be performed by modifying the direct to reverberant ratio of the sound. If there is any reverberation present, reverberation energy usually is constant but the direct signal energy becomes smaller. Furthermore distance rendering may be performed by attenuating early reflections in reverberation to become more quiet and sparse with the increasing distance. In some embodiments adding a low-pass filter to attenuate high frequencies can be implemented to approximate the effect of the attenuation of higher frequency components when the distance increases.
The VBAP processor 406 may therefore be used to calculate a suitable gain for the signal, given the desired loudspeaker positions. VBAP processor 406 may provide gains for a signal such that it can be spatially positioned to a suitable position. These gains may be passed to a series of multipliers 407. In the following example the spatial extent synthesiser (or spatially extending controller) is implemented using a vector based amplitude panning operation. However it is understood that the spatial extent synthesis or spatially extending control may be implementation agnostic and any suitable implementation used to generate the spatially extending control. For example in some embodiments the spatially extending control may implement direct binaural panning (using Head related transfer function filters for directions), direct assignment to the output channel locations (for example direct assignment to the loudspeakers without using any panning), synthesized ambisonics, and wave-field synthesis.
In some embodiments the spatially extending synthesiser may further comprise a series of multipliers 407. In figure 7 is shown one multiplier for each frequency band. Thus the series of multipliers comprise multipliers 407i to 4079, however any suitable number of multipliers may be used. Each frequency domain band signal may be multiplied in the multiplier 407 with the determined VBAP gains.
The products of the VBAP gains and each frequency band signal may be passed to a series of output channel sum devices 409.
In some embodiments the spatially extending synthesiser 215 may further comprise a series of sum devices 409. The sum devices 409 may receive the outputs from the multipliers and combine them to generate an output channel band signal 411. In the exampie shown in figure 7, a 4.0 loudspeaker format output is implemented with outputs for front left (Band FL F 411i), front right (Band FR F 4112), rear left (Band RL F 411a), and rear right (Band RR F 4114) channels which are generated by sum devices 409i, 4092, 4093 4094 respectively. In some other embodiments other loudspeaker formats or number of channels can be supported.
Furthermore in some embodiments other panning methods can be used such as panning laws, or the signals could be assigned to the closest loudspeakers directly.
In some embodiments the spatially extending synthesiser may further comprise a series of inverse Short-Time Fourier Transforms (ISTFT) 413. For example as shown in figure 7 there is an ISTFT 413i associated with the FL signal an ISTFT 4132 associated with the FR signal, an ISTFT 413s associated with the RL signai output and an ISTFT 4134 associated with the RR signal. In other words it provides N component audio signals to be played from different directions based on the spatially extending parameters. The signals are subjected to Inverse Short
Time Fourier Transform (ISTFT) and overlap-added to produce time-domain outputs.
These component signals may be provided for rendering and also for analysis for the purpose of ensuring even energy distributions between the components.
In some embodiments as discussed earlier each loudspeaker or channel audio signal may be selectively spatially extended. Thus for example in some embodiments all of the loudspeaker channels except the centre channel may be processed and the centre channel is unprocessed since it often contains speech (e.g., movie 5.1 mixes).
In some embodiments the channel processing may be controlled based on a speech or voice activity detector analysis of the audio signal. Thus for each channel it is determined whether the audio signal comprises for the centre channel (or any other channel) mostly speech. If the audio signal is mostly speech, then that channel may receive a smaller spatial extent value or not be spatially extended at ail.
In some embodiments all channels or loudspeaker audio signals are summed together and this audio signal is then spatially widened to 360 degrees (or other suitable extent) before combining this to the original loudspeaker audio signals.
In some further embodiments the user may be able to use the spatially extending control input to affect the amount of spatially extending present in the output signal. This may allow a tuning of the reproduction between normal, almost point-like, audio signals and widely spread signals.
These embodiments may be implemented when the user starts to listen to a sound scene, which contains a sound source or object moving around him. Firstly, the user may listen to the sound scene with the method disabled (no spatial extending applied). As the sound object moves around the user, the user hears its sound changing depending on whether the sound comes exactly from the direction of a loudspeaker or from between two loudspeakers. The sound sounds a bit dull between speakers, and it also sounds a bit larger. At the location of a speaker, the sound is sharp, clear, and narrow. The user may not be fully convinced about the reproduction quality and does not feel fully immersed. The user may then enable the proposed processing (or in other embodiments be enabled automatically). After the processing is enabled, the user would experience an increased immersion as the sound somehow starts to fill the whole room. Moreover, as the sound moves around the user would no longer hear timbral or spatial spread changes, the only thing which changes is the object’s spatial position. The user is likely to be much happy with the sound quality.
With respect to Figure 8 an example electronic device which may be used as the mixer and/or ambience signal generator is shown. The device may be any suitable electronics device or apparatus. For example in some embodiments the device 1200 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
The device 1200 may comprise a microphone 1201. The microphone 1201 may comprise a plurality (for example a number N) of microphones. However it is understood that there may be any suitable configuration of microphones and any suitable number of microphones. In some embodiments the microphone 1201 is separate from the apparatus and the audio signal transmitted to the apparatus by a wired or wireless coupling. The microphone 1201 may in some embodiments be the microphone array as shown in the previous figures.
The microphone may be a transducer configured to convert acoustic waves into suitable electrical audio signals. In some embodiments the microphone can be solid state microphones. In other words the microphone may be capable of capturing audio signals and outputting a suitable digital format signal. In some other embodiments the microphone 1201 can comprise any suitable microphone or audio capture means, for example a condenser microphone, capacitor microphone, electrostatic microphone, Electret condenser microphone, dynamic microphone, ribbon microphone, carbon microphone, piezoelectric microphone, or microelectrical-mechanical system (MEMS) microphone. The microphone can in some embodiments output the audio captured signal to an analogue-to-digital converter (ADC) 1203.
The device 1200 may further comprise an analogue-to-digital converter 1203. The analogue-to-digital converter 1203 may be configured to receive the audio signals from each of the microphone 1201 and convert them into a format suitable for processing. In some embodiments where the microphone is an integrated microphone the analogue-to-digital converter is not required. The analogue-to-digital converter 1203 can be any suitable analogue-to-digital conversion or processing means. The analogue-to-digital converter 1203 may be configured to output the digital representations of the audio signal to a processor 1207 or to a memory 1211.
In some embodiments the device 1200 comprises at least one processor or central processing unit 1207. The processor 1207 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 1200 comprises a memory 1211. In some embodiments the at least one processor 1207 is coupled to the memory 1211. The memory 1211 can be any suitable storage means. In some embodiments the memory 1211 comprises a program code section for storing program codes implementable upon the processor 1207. Furthermore in some embodiments the memory 1211 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1207 whenever needed via the memory-processor coupling.
In some embodiments the device 12.00 comprises a user interface 1205. The user interface 1205 can be coupled in some embodiments to the processor 1207. In some embodiments the processor 1207 can control the operation of the user interface 1205 and receive inputs from the user interface 1205. In some embodiments the user interface 1205 can enable a user to input commands to the device 1200, for example via a keypad. In some embodiments the user interface 205 can enable the user to obtain information from the device 1200. For example the user interface 1205 may comprise a display configured to display information from the device 1200 to the user. The user interface 1205 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1200 and further displaying information to the user of the device 1200. In some embodiments the user interface 1205 may be the user interface for communicating with the position determiner as described herein.
In some implements the device 1200 comprises a transceiver 1209. The transceiver 1209 in such embodiments can be coupied to the processor 1207 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver 1209 or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
For example as shown in Figure 11 the transceiver 1209 may be configured to communicate with the renderer as described herein.
The transceiver 1209 can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver 1209 or transceiver means can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
In some embodiments the device 1200 may be employed as at least part of the renderer. As such the transceiver 1209 may be configured to receive the audio signals and positional information from the microphone/close microphones/position determiner as described herein, and generate a suitable audio signal rendering by using the processor 1207 executing suitable code. The device 1200 may comprise a digital-to-analogue converter 1213. The digital-to-analogue converter 1213 may be coupled to the processor 1207 and/or memory 1211 and be configured to convert digital representations of audio signals (such as from the processor 1207 following an audio rendering of the audio signals as described herein) to a suitable analogue format suitable for presentation via an audio subsystem output. The digital-to-analogue converter (DAC) 1213 or signal processing means can in some embodiments be any suitable DAC technology.
Furthermore the device 1200 can comprise in some embodiments an audio subsystem output 1215. An example as shown in Figure 8 shows the audio subsystem output 1215 as an output socket configured to enabling a coupling with headphones 121. However the audio subsystem output 1215 may be any suitable audio output or a connection to an audio output. For exampie the audio subsystem output 1215 may be a connection to a muitichannei speaker system.
in some embodiments the digital to analogue converter 1213 and audio subsystem 1215 may be implemented within a physically separate output device. For exampie the DAC 1213 and audio subsystem 1215 may be implemented as cordless earphones communicating with the device 1200 via the transceiver 1209.
Although the device 1200 is shown having both audio capture, audio processing and audio rendering components, it wouid be understood that in some embodiments the device 1200 can comprise just some of the elements.
In genera!, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be iliustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, opticai memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or fab for fabrication.
The foregoing description has provided by way of exemplary and nonlimiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims (60)

CLAIMS:
1. An apparatus for generating at ieast two channei audio signals, each channel associated with a channei position within a sound scene comprising at least one sound source, the apparatus configured to:
receive and/or determine at least one channel audio signal associated with the at least one sound source;
generate at least one spatially extended audio signal based on the at least one channel audio signal;
combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
2. The apparatus as claimed in claim 1. wherein the apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source is configured to receive at least two neighbouring channel audio signals.
3. The apparatus as claimed in claim 2, further configured to analyse the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein the apparatus configured to generate the at least one spatially extended audio signal is configured to apply a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis is controlled based on the at least one cross-channel movement parameter.
4. The apparatus as claimed in claim 3, wherein the apparatus configured to combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals is configured to combine:
a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
5. The apparatus as claimed in any of claims 3 and 4, wherein the apparatus configured analyse the at least one channel audio signal to determine at least one cross-channel movement parameter is further configured to:
determine at least one joint audio component within the at least two neighbouring channel audio signals; and determine a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
6. The apparatus as claimed in claim 5, wherein the apparatus configured to determine at least one joint audio component within the at least two neighbouring channel audio signals is further configured to determine at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
7. The apparatus as claimed in any of claims 5 and 6, wherein the apparatus configured to determine a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals is configured to determine a crosschannel movement parameter based on determining at least one of:
a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
8. The apparatus as claimed in claim 1, wherein the apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source within the sound scene is configured to:
receive an sound source based audio signal;
generate the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
9. The apparatus as claimed in claim 8, wherein the apparatus is further configured to analyse position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
10. The apparatus as claimed in claim 9, wherein the apparatus configured to generate the at least one spatially extended audio signal is configured to apply a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis is controlled based on the at least one cross-channel movement parameter.
11. The apparatus as claimed in any of claims 9 and 10, wherein the apparatus configured to analyse position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter is configured to determine a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
12. The apparatus as claimed in any of claims 3 to 7 or 9 to 11, wherein the apparatus configured to combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals is configured to:
determine at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter;
combine the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
13. The apparatus as claimed in any of the claims 1 to 12, wherein the apparatus configured to generate at least one spatially extended audio signal based on the at least one channel audio signal is configured to apply at least one of:
a vector base amplitude panning to the at least one audio signal;
direct binaural panning;
direct assignment to channel output location;
synthesized ambisonics; and wavefield synthesis.
14. The apparatus as claimed in claim 13, wherein the apparatus configured to apply a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal is configured to:
determine a spatially extending parameter; and determine at least one position associated with the at least one channel audio signal;
determine at least one frequency band position based on the at least one position and the spatial extent parameter; and generate panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
15. The apparatus as claimed in claims 13 to 14, further configured to determine a position of the at least one channel relative to the apparatus.
16. The apparatus as claimed in any of claims 13 to 15 when based on any of claims 3 to 7 or 9 to 11, wherein the spatially extending synthesis vector base amplitude panning is configured to be controlled such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
17. The apparatus as claimed in any of claims 1 to 16, wherein the at least two channel audio signals are one of:
loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
18. The apparatus as claimed in any of claims 1 to 17, wherein the apparatus configured to generate at least one spatially extended audio signal based on the at least one channel audio signal is configured to generate at least one of:
at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
19. The apparatus as claimed in any of claims 1 to 18, configured to receive and/or determine at least one channel audio signal associated with the at least one sound source is configured to receive at least two channel audio signals, wherein the apparatus is configured to selectively generate the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and combine the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
20. The apparatus as claimed in any of claims 1 to 19, configured to receive at least one channel audio signal is further configured to receive at least two audio signals, wherein at least one of the at least two audio signals is a channel based audio signal and at least one of the at least two audio signals is an object-based from which further channel based audio signals are determined.
21. A method for generating at ieast two channel audio signals, each channel associated with a channel position within a sound scene comprising at least one sound source, the method comprising:
receiving and/or determining at least one channel audio signal associated with the at least one sound source;
generating at least one spatially extended audio signal based on the at least one channel audio signal;
combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
22. The method as claimed in claim 21, wherein receiving and/or determining at least one channel audio signal associated with the at least one sound source comprises receiving at least two neighbouring channel audio signals.
23. The method as claimed in claim 22, further comprising analysing the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein generating the at least one spatially extended audio signal comprises applying a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis is controlled based on the at least one cross-channel movement parameter.
24. The method as claimed in claim 23, wherein combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals comprises combining:
a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
25. The method as claimed in any of claims 23 and 24, wherein analysing the at least one channel audio signal to determine at least one cross-channel movement parameter further comprises:
determining at least one joint audio component within the at least two neighbouring channel audio signals; and determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
26. The method as claimed in claim 25, wherein determining at least one joint audio component within the at least two neighbouring channel audio signals comprises determining at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
27. The method as claimed in any of claims 25 and 26, wherein determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals comprises determining a cross-channel movement parameter based on determining at least one of:
a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
28. The method as claimed in claim 21, wherein receiving and/or determining at least one channel audio signal associated with the at least one sound source within the sound scene comprises:
receiving an sound source based audio signal;
generating the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
29. The method as claimed in claim 28, further comprising analysing position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
30. The method as claimed in claim 29, wherein generating the at least one spatially extended audio signal comprises applying a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis is controlled based on the at least one cross-channel movement parameter.
31. The method as claimed in any of claims 29 and 30, wherein analysing position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter comprises determining a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
32. The method as claimed in any of claims 23 to 27 or 29 to 31, wherein combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals comprises:
determining at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter;
combining the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
33. The method as claimed in any of the claims 31 to 32, wherein generating at least one spatially extended audio signal based on the at least one channel audio signal comprises applying at least one of:
a vector base amplitude panning to the at least one audio signal;
direct binaural panning;
direct assignment to channel output location;
synthesized ambisonics; and wavefield synthesis.
34. The method as claimed in claim 33, wherein applying a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal comprises:
determining a spatially extending parameter;
determining at least one position associated with the at least one channel audio signal;
determining at least one frequency band position based on the at least one position and the spatial extent parameter; and generating panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
35. The method as claimed in claims 33 to 34, further comprising determining a position of the at least one channel relative to the apparatus.
36. The method as claimed in any of claims 33 to 35 when based on any of claims 23 to 27 or 29 to 31, wherein controlling the spatially extending synthesis vector base amplitude panning is such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
37. The method as claimed in any of claims 21 to 36, wherein the at least two channel audio signals are one of:
loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
38. The method as claimed in any of claims 21 to 37, wherein generating at least one spatially extended audio signal based on the at least one channel audio signal comprises generating at least one of:
at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
39. The method as claimed in any of claims 21 to 38, wherein receiving and/or determining at least one channel audio signal associated with the at least one sound source comprises receiving at least two channel audio signals, wherein the method comprises:
selectively generating the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and combining the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
40. The method as claimed in any of claims 21 to 39, wherein receiving at least one channel audio signal further comprises receiving at least two audio signals, wherein at least one of the at least two audio signals is a channei based audio signal and at least one of the at least two audio signals is an object-based from which further channei based audio signals are determined.
41. An apparatus for generating at least two channel audio signals, each channel associated with a channel position within a sound scene comprising at least one sound source, the apparatus comprising:
means for receiving and/or determining at least one channel audio signal associated with the at least one sound source;
means for generating at least one spatially extended audio signal based on the at least one channel audio signal;
means for combining the at ieast one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
42. The apparatus as claimed in claim 41, wherein means for receiving and/or determining at least one channel audio signal associated with the at least one sound source comprises means for receiving at least two neighbouring channel audio signals.
43. The apparatus as claimed in claim 42. further comprising means for analysing the at least two channel audio signals to determine at least one crosschannel movement parameter, wherein means for generating the at least one spatially extended audio signal comprises means for applying a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis is controlled based on the at least one cross-channel movement parameter.
44. The apparatus as claimed in claim 43, wherein means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals comprises means for combining:
a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
45. The apparatus as claimed in any of claims 43 and 44, wherein means for analysing the at least one channel audio signal to determine at least one crosschannel movement parameter further comprises:
means for determining at least one joint audio component within the at least two neighbouring channel audio signals; and means for determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
46. The apparatus as claimed in claim 45. wherein means for determining at least one joint audio component within the at least two neighbouring channel audio signals comprises means for determining at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
47. The apparatus as claimed in any of claims 45 and 46, wherein means for determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals comprises means for determining a crosschannel movement parameter based on determining at least one of:
a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
48. The apparatus as claimed in claim 41. wherein means for receiving and/or determining at least one channel audio signal associated with the at least one sound source within the sound scene comprises:
means for receiving an sound source based audio signal;
means for generating the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
49. The apparatus as claimed in claim 48, further comprising means for analysing position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
50. The apparatus as claimed in claim 49, wherein means for generating the at least one spatially extended audio signal comprises means for applying a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis is controlled based on the at least one cross-channel movement parameter.
51. The apparatus as claimed in any of claims 49 and 50, wherein means for analysing position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter comprises means for determining a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
52. The apparatus as claimed in any of claims 43 to 47 or 49 to 51, wherein means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals comprises:
means for determining at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter;
means for combining the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
53. The apparatus as claimed in any of the claims 51 to 52, wherein means for generating at least one spatially extended audio signal based on the at least one channel audio signal comprises means for applying at least one of:
a vector base amplitude panning to the at least one audio signal;
direct binaural panning;
direct assignment to channel output location;
synthesized ambisonics; and wavefield synthesis.
54. The apparatus as claimed in claim 53, wherein means for applying a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal comprises:
means for determining a spatially extending parameter;
means for determining at least one position associated with the at least one channel audio signal;
means for determining at least one frequency band position based on the at least one position and the spatial extent parameter; and means for generating panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
55. The apparatus as claimed in claims 53 to 54, further comprising means for determining a position of the at least one channel relative to the apparatus.
56. The apparatus as claimed in any of claims 53 to 55 when based on any of claims 43 to 47 or 49 to 51, wherein means for controlling the spatially extending synthesis vector base amplitude panning is such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
57. The apparatus as claimed in any of claims 41 to 56, wherein the at least two channel audio signals are one of:
loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
58. The apparatus as claimed in any of claims 41 to 57, wherein means for generating at least one spatially extended audio signal based on the at least one channel audio signal comprises means for generating at least one of:
at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
59. The apparatus as claimed in any of claims 41 to 58, wherein means for receiving and/or determining at least one channel audio signal associated with the at least one sound source comprises means for receiving at least two channel audio signals, wherein the apparatus comprises:
means for selectively generating the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and means for combining the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
60. The apparatus as claimed in any of claims 41 to 59, wherein means for receiving at least one channel audio signal further comprises means for receiving at least two audio signals, wherein at least one of the at least two audio signals is a channel based audio signal and at least one of the at least two audio signals is an object-based from which further channel based audio signals are determined.
GB1706288.6A 2017-04-20 2017-04-20 Enhancing loudspeaker playback using a spatial extent processed audio signal Withdrawn GB2565747A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB1706288.6A GB2565747A (en) 2017-04-20 2017-04-20 Enhancing loudspeaker playback using a spatial extent processed audio signal
PCT/FI2018/050277 WO2018193163A1 (en) 2017-04-20 2018-04-19 Enhancing loudspeaker playback using a spatial extent processed audio signal
EP18787484.7A EP3613221A4 (en) 2017-04-20 2018-04-19 Enhancing loudspeaker playback using a spatial extent processed audio signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1706288.6A GB2565747A (en) 2017-04-20 2017-04-20 Enhancing loudspeaker playback using a spatial extent processed audio signal

Publications (2)

Publication Number Publication Date
GB201706288D0 GB201706288D0 (en) 2017-06-07
GB2565747A true GB2565747A (en) 2019-02-27

Family

ID=58795837

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1706288.6A Withdrawn GB2565747A (en) 2017-04-20 2017-04-20 Enhancing loudspeaker playback using a spatial extent processed audio signal

Country Status (3)

Country Link
EP (1) EP3613221A4 (en)
GB (1) GB2565747A (en)
WO (1) WO2018193163A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023131398A1 (en) * 2022-01-04 2023-07-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for implementing versatile audio object rendering

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2579348A (en) * 2018-11-16 2020-06-24 Nokia Technologies Oy Audio processing
GB2580899A (en) * 2019-01-22 2020-08-05 Nokia Technologies Oy Audio representation and associated rendering
CN110267064B (en) * 2019-06-12 2021-11-12 百度在线网络技术(北京)有限公司 Audio playing state processing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0966179A2 (en) * 1998-06-20 1999-12-22 Central Research Laboratories Limited A method of synthesising an audio signal
US20080037796A1 (en) * 2006-08-08 2008-02-14 Creative Technology Ltd 3d audio renderer
US20130034235A1 (en) * 2011-08-01 2013-02-07 Samsung Electronics Co., Ltd. Signal processing apparatus and method for providing spatial impression

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080298610A1 (en) * 2007-05-30 2008-12-04 Nokia Corporation Parameter Space Re-Panning for Spatial Audio
EP2154911A1 (en) * 2008-08-13 2010-02-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. An apparatus for determining a spatial output multi-channel audio signal
US9826328B2 (en) * 2012-08-31 2017-11-21 Dolby Laboratories Licensing Corporation System for rendering and playback of object based audio in various listening environments
EP3280162A1 (en) * 2013-08-20 2018-02-07 Harman Becker Gépkocsirendszer Gyártó Korlátolt Felelösségü Társaság A system for and a method of generating sound

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0966179A2 (en) * 1998-06-20 1999-12-22 Central Research Laboratories Limited A method of synthesising an audio signal
US20080037796A1 (en) * 2006-08-08 2008-02-14 Creative Technology Ltd 3d audio renderer
US20130034235A1 (en) * 2011-08-01 2013-02-07 Samsung Electronics Co., Ltd. Signal processing apparatus and method for providing spatial impression

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023131398A1 (en) * 2022-01-04 2023-07-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for implementing versatile audio object rendering

Also Published As

Publication number Publication date
EP3613221A4 (en) 2021-01-13
EP3613221A1 (en) 2020-02-26
WO2018193163A1 (en) 2018-10-25
GB201706288D0 (en) 2017-06-07

Similar Documents

Publication Publication Date Title
JP6818841B2 (en) Generation of binaural audio in response to multi-channel audio using at least one feedback delay network
US11212638B2 (en) Generating binaural audio in response to multi-channel audio using at least one feedback delay network
KR101341523B1 (en) Method to generate multi-channel audio signals from stereo signals
JP5955862B2 (en) Immersive audio rendering system
JP4921470B2 (en) Method and apparatus for generating and processing parameters representing head related transfer functions
US9197977B2 (en) Audio spatialization and environment simulation
TWI475896B (en) Binaural filters for monophonic compatibility and loudspeaker compatibility
CN113170271B (en) Method and apparatus for processing stereo signals
EP3613221A1 (en) Enhancing loudspeaker playback using a spatial extent processed audio signal
AU2017210021A1 (en) Synthesis of signals for immersive audio playback
EP2484127B1 (en) Method, computer program and apparatus for processing audio signals
EP3579584A1 (en) Controlling rendering of a spatial audio scene
WO2018193162A2 (en) Audio signal generation for spatial audio mixing
BR112016006832B1 (en) Method for deriving m diffuse audio signals from n audio signals for the presentation of a diffuse sound field, apparatus and non-transient medium
US11388540B2 (en) Method for acoustically rendering the size of a sound source
WO2018193160A1 (en) Ambience generation for spatial audio mixing featuring use of original and extended signal
US11924623B2 (en) Object-based audio spatializer
US11665498B2 (en) Object-based audio spatializer
WO2018193161A1 (en) Spatially extending in the elevation domain by spectral extension
Becerra Binaural Audio Project
KR20060026234A (en) 3d audio playback system and method thereof
Kim et al. 3D Sound Techniques for Sound Source Elevation in a Loudspeaker Listening Environment

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)