WO2018193163A1 - Amélioration de lecture de haut-parleur à l'aide d'un signal audio traité en étendue spatiale - Google Patents
Amélioration de lecture de haut-parleur à l'aide d'un signal audio traité en étendue spatiale Download PDFInfo
- Publication number
- WO2018193163A1 WO2018193163A1 PCT/FI2018/050277 FI2018050277W WO2018193163A1 WO 2018193163 A1 WO2018193163 A1 WO 2018193163A1 FI 2018050277 W FI2018050277 W FI 2018050277W WO 2018193163 A1 WO2018193163 A1 WO 2018193163A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channel
- channel audio
- audio signals
- audio signal
- spatially extended
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
- H04S5/005—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation of the pseudo five- or more-channel type, e.g. virtual surround
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/03—Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/07—Synergistic effects of band splitting and sub-band processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
Definitions
- the present application relates to apparatus and methods for enhancing loudspeaker playback using a spatial extent processed audio signal.
- Capture of audio signals from multiple sources and mixing of audio signals when these sources are moving in the spatial field requires significant effort. For example the capture and mixing of an audio signal source such as a speaker or artist within an audio environment such as a theatre or lecture hall to be presented to a listener and produce an effective audio atmosphere requires significant investment in equipment and training.
- a commonly implemented system is where one or more 'external' microphones, for example a Lavalier microphone worn by the user or an audio channel associated with an instrument, is mixed with a suitable spatial (or environmental or audio field) audio signal such that the produced sound comes from an intended direction.
- This system is known in some areas as Spatial Audio Mixing (SAM).
- SAM Spatial Audio Mixing
- the SAM system enables the creation of immersive sound scenes comprising "background spatial audio" or ambience and sound objects for Virtual Reality (VR) applications.
- the scene can be designed such that the overall spatial audio of the scene, such as a concert venue, is captured with a microphone array (such as one contained in the OZO virtual camera) and the most important sources captured using the 'external' microphones.
- spatial extent or spatial spread refers to the degree of localization associated with a sound object.
- the sound object is point-like when its spatial extent is at minimum. With a larger spatial extent, the sound is perceived from more than one direction simultaneously.
- a common method to playback sounds using loudspeakers is to use amplitude panning.
- amplitude panning a 'sound object' is positioned between a loudspeaker pair (or inside a loudspeaker triplet) by mixing it to several loudspeakers simultaneously using suitable gain parameters.
- humans perceive a virtual audio object between the loudspeakers (or in the middle of a loudspeaker triplet).
- a sound position exactly coincides with a position of a loudspeaker, the sound is played only from that loudspeaker.
- a known limitation of loudspeaker sound reproduction with amplitude panning is that the perceived spatial extent of a sound object may vary depending on the number of loudspeakers which currently play back the sound object. Depending on the panning direction, the number of active loudspeakers varies. It has been observed that undesired effect such as changing spatial spread and spectral coloration may occur because of this.
- an apparatus for generating at least two channel audio signals each channel associated with a channel position within a sound scene comprising at least one sound source, the apparatus configured to: receive and/or determine at least one channel audio signal associated with the at least one sound source; generate at least one spatially extended audio signal based on the at least one channel audio signal; combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
- the apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source may be configured to receive at least two neighbouring channel audio signals.
- the apparatus may be further configured to analyse the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein the apparatus configured to generate the at least one spatially extended audio signal may be configured to apply a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis may be controlled based on the at least one cross-channel movement parameter.
- the apparatus configured to combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may be configured to combine: a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
- the apparatus configured to analyse the at least one channel audio signal to determine at least one cross-channel movement parameter may be further configured to: determine at least one joint audio component within the at least two neighbouring channel audio signals; and determine a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
- the apparatus configured to determine at least one joint audio component within the at least two neighbouring channel audio signals may be further configured to determine at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
- the apparatus configured to determine a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals may be configured to determine a cross-channel movement parameter based on determining at least one of: a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
- the apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source within the sound scene may be configured to: receive a sound source based audio signal; generate the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
- the apparatus may be further configured to analyse position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
- the apparatus configured to generate the at least one spatially extended audio signal may be configured to apply a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis may be controlled based on the at least one cross-channel movement parameter.
- the apparatus configured to analyse position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter may be configured to determine a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
- the apparatus configured to combine the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may be configured to: determine at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter; combine the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
- the apparatus configured to generate at least one spatially extended audio signal based on the at least one channel audio signal may be configured to apply at least one of: a vector base amplitude panning to the at least one audio signal; direct binaural panning; direct assignment to channel output location; synthesized ambisonics; and wavefield synthesis.
- the apparatus configured to apply a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal may be configured to: determine a spatially extending parameter; and determine at least one position associated with the at least one channel audio signal; determine at least one frequency band position based on the at least one position and the spatial extent parameter; and generate panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
- the apparatus may be further configured to determine a position of the at least one channel relative to the apparatus.
- the spatially extending synthesis vector base amplitude panning may be configured to be controlled such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
- the at least two channel audio signals may be one of: loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
- the apparatus configured to generate at least one spatially extended audio signal based on the at least one channel audio signal may be configured to generate at least one of: at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
- the apparatus configured to receive and/or determine at least one channel audio signal associated with the at least one sound source may be configured to receive at least two channel audio signals, wherein the apparatus may be configured to: selectively generate the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and combine the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
- the apparatus configured to receive at least one channel audio signal may be further configured to receive at least two audio signals, wherein at least one of the at least two audio signals may be a channel based audio signal and at least one of the at least two audio signals may be an object-based from which further channel based audio signals are determined.
- a method for generating at least two channel audio signals, each channel associated with a channel position within a sound scene comprising at least one sound source comprising: receiving and/or determining at least one channel audio signal associated with the at least one sound source; generating at least one spatially extended audio signal based on the at least one channel audio signal; combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
- Receiving and/or determining at least one channel audio signal associated with the at least one sound source may comprise receiving at least two neighbouring channel audio signals.
- the method may comprise analysing the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein generating the at least one spatially extended audio signal may comprise applying a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis may be controlled based on the at least one cross-channel movement parameter.
- Combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals comprises may comprise: a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
- Determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals may comprise determining a cross-channel movement parameter based on determining at least one of: a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
- Receiving and/or determining at least one channel audio signal associated with the at least one sound source within the sound scene may comprise: receiving an sound source based audio signal; generating the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
- the method may comprise analysing position data associated with the sound source based audio signal to determine at least one cross-channel movement parameter.
- Generating the at least one spatially extended audio signal may comprise applying a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis is controlled based on the at least one cross-channel movement parameter.
- Analysing position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter may comprise determining a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
- Combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may comprise: determining at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter; combining the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
- Generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise applying at least one of: a vector base amplitude panning to the at least one audio signal; direct binaural panning; direct assignment to channel output location; synthesized ambisonics; and wavefield synthesis.
- Applying a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal may comprise: determining a spatially extending parameter; determining at least one position associated with the at least one channel audio signal; determining at least one frequency band position based on the at least one position and the spatial extent parameter; and generating panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
- the method may further comprise determining a position of the at least one channel relative to the apparatus.
- Controlling the spatially extending synthesis vector base amplitude panning may be such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
- the at least two channel audio signals may be one of: loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
- Generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise generating at least one of: at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
- Receiving and/or determining at least one channel audio signal associated with the at least one sound source may comprise receiving at least two channel audio signals, wherein the method may further comprise: selectively generating the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and combining the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
- Receiving at least one channel audio signal may further comprise receiving at least two audio signals, wherein at least one of the at least two audio signals may be a channel based audio signal and at least one of the at least two audio signals may be an object-based from which further channel based audio signals are determined.
- an apparatus for generating at least two channel audio signals each channel associated with a channel position within a sound scene comprising at least one sound source
- the apparatus comprising: means for receiving and/or determining at least one channel audio signal associated with the at least one sound source; means for generating at least one spatially extended audio signal based on the at least one channel audio signal; means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals.
- Means for receiving and/or determining at least one channel audio signal associated with the at least one sound source may comprise means for receiving at least two neighbouring channel audio signals.
- the apparatus may further comprise means for analysing the at least two channel audio signals to determine at least one cross-channel movement parameter, wherein means for generating the at least one spatially extended audio signal may comprise means for applying a spatially extending synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extending synthesis may be controlled based on the at least one cross-channel movement parameter.
- the means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may comprise means for combining: a first of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the first of the at least two neighbouring channel audio signals to generate a first of at least two neighbouring output channel audio signals; and a second of the at least two neighbouring channel audio signals and a spatially extended channel audio signal based on the second of the at least two neighbouring channel audio signals to generate a second of at least two neighbouring output channel audio signals.
- the means for analysing the at least one channel audio signal to determine at least one cross-channel movement parameter may further comprise: means for determining at least one joint audio component within the at least two neighbouring channel audio signals; and means for determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals.
- the means for determining at least one joint audio component within the at least two neighbouring channel audio signals may comprise means for determining at least one frequency band of the at least two neighbouring channel audio signals which has a correlation greater than a determined value.
- the means for determining a cross-channel movement parameter based on an analysis of level changes of the at least one joint audio component within the at least two neighbouring channel audio signals may comprise means for determining a cross- channel movement parameter based on determining at least one of: a level increase change of the at least one joint audio component within one of the at least two neighbouring channel audio signals; and a level decrease change of the at least one joint audio component within one of the at least two neighbouring channel audio signals.
- the means for receiving and/or determining at least one channel audio signal associated with the at least one sound source within the sound scene may comprise: means for receiving an sound source based audio signal; means for generating the at least two channel audio signals associated with the at least one sound source based on the sound source based audio signal and positions associated with the at least two channels.
- the apparatus may further comprise means for analysing position data associated with the sound source based audio signal to determine at least one cross- channel movement parameter.
- the means for generating the at least one spatially extended audio signal may comprise means for applying a spatially extended synthesis to the at least two neighbouring channel audio signals to generate at least two spatially extended audio signals, wherein the spatially extended synthesis may controlled based on the at least one cross-channel movement parameter.
- the means for analysing position data associated with the sound source based audio signal to determine the at least one cross channel movement parameter may comprise means for determining a parameter based on a change of an azimuth associated with the audio object position over a determined, when the change of the azimuth exceeds a predetermined value.
- the means for combining the at least one channel audio signal and the at least one spatially extended channel audio signal to generate at least two output channel audio signals may comprise: means for determining at least one weighting value for applying to at least one of the at least one channel audio signal and the at least one spatially extended channel audio signal, the at least one weighting value based on the at least one cross channel movement parameter; means for combining the processed at least one channel audio signal and the at least one spatially extended channel audio signal to generate the at least one output channel audio signal.
- Generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise means for applying at least one of: a vector base amplitude panning to the at least one audio signal; direct binaural panning; direct assignment to channel output location; synthesized ambisonics; and wavefield synthesis.
- the means for applying a spatial extent synthesis vector base amplitude panning to the at least one channel audio signal may comprise: means for determining a spatially extending parameter; means for determining at least one position associated with the at least one channel audio signal; means for determining at least one frequency band position based on the at least one position and the spatial extent parameter; and means for generating panning vectors for the application of vector base amplitude panning to frequency bands of the at least one channel audio signal.
- the apparatus may further comprise means for determining a position of the at least one channel relative to the apparatus.
- the means for controlling the spatially extending synthesis vector base amplitude panning may be such that the spatially extending synthesis is modified based on the at least one cross channel movement parameter.
- the at least two channel audio signals may be one of: loudspeaker channel audio signals; and virtual loudspeaker channel audio signals.
- the means for generating at least one spatially extended audio signal based on the at least one channel audio signal may comprise means for generating at least one of: at least one 360 degree spatially extended audio signal; and at least one less than 360 degree spatially extended audio signal.
- the means for receiving and/or determining at least one channel audio signal associated with the at least one sound source may comprise means for receiving at least two channel audio signals, wherein the apparatus may comprise: means for selectively generating the spatially extended audio signals based on the at least two channel audio signals, such that at least one of the two channel audio signals is not spatially extended and at least one of the two channel audio signals is spatially extended; and means for combining the at least two channel audio signals, and the selectively generated spatially extended channel audio signals to generate at least two output channel audio signals.
- the means for receiving at least one channel audio signal may further comprise means for receiving at least two audio signals, wherein at least one of the at least two audio signals may be a channel based audio signal and at least one of the at least two audio signals may be an object-based from which further channel based audio signals are determined.
- a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
- An electronic device may comprise apparatus as described herein.
- a chipset may comprise apparatus as described herein.
- Embodiments of the present application aim to address problems associated with the state of the art.
- Figure 1 shows schematically an example loudspeaker playback system employing a spatially extended audio signal according to some embodiments
- Figure 2 shows schematically an example loudspeaker arrangement using amplitude panning of sound sources
- Figure 3 shows schematically an example loudspeaker arrangement using spatially extended audio signals as part of a panning of the sound sources according to some embodiments
- Figure 4 shows schematically the example loudspeaker playback system employing a spatially extended audio signal as shown in Figure 1 in further detail according to some embodiments;
- Figure 5 shows a flow diagram of the operation of the example loudspeaker playback system shown in Figure 4 according to some embodiments
- Figure 6 shows schematically the example loudspeaker playback system shown in Figure 5 in further detail with respect to channel based input audio signals according to some embodiments;
- FIG. 7 shows schematically the spatial extent synthesizer shown in Figures above in further detail according to some embodiments.
- Figure 8 shows schematically an example device suitable for implementing the apparatus shown above according to some embodiments.
- the concept as presented in the embodiments hereafter is the spatially extending of the output channel or loudspeaker signals, and combining or summing the spatially extended audio signals to the normal (or Vector-Base-Amplitude-Panning (VBAP)- rendered) loudspeaker audio signals.
- the spatially extending of the loudspeaker audio signals may be chosen so that together they cover the whole 360 degrees.
- the result of the proposed processing is that it makes the loudspeaker reproduction more spatially enveloping and makes perceiving loudspeaker locations less obvious. This leads to better immersion as the listener has a smaller likelihood of perceiving the speaker locations but feels more fully immersed in the intended sound scene.
- the invention reduces changes in sound spectrum and spectral spread across different locations, without requiring any increase to the spectral spread of sounds.
- the original loudspeaker signal is combined with the spatially extended signal, it is possible in some embodiments to maintain both a point-like perception of sounds when needed and still increase spatial envelopment and uniformity of sound reproduction across different spatial positions.
- the spatially extending of the input channel or loudspeaker audio signal is selectively applied to some channels.
- the system spatially extends loudspeaker audio signal signals for all channels except the centre channel. This is because the centre channel conventionally is the speech channel and spatially extending the audio signals associated with the speech channel may produce artificially or unnaturally sounding sound scenes.
- the input channels or input audio signals are analysed in order to determine whether spatially extending the audio signal is to be applied or whether the audio signal may be output without processing.
- all of the channel (loudspeaker) audio signals are combined together and spatially extended or widened to a full 360 degrees.
- the spatially extended combined audio signals are then combined with the virtual loudspeaker or point sound source audio signals before rendering or outputting these.
- the apparatus may be part of any suitable electronic device or apparatus configured to capture an audio signal or receive the audio signals and other information signals.
- FIG 1 an example loudspeaker playback system employing a spatially extended audio signal mixing suitable for implementing embodiments is shown.
- the example shown in figure 1 is a channel based audio signal system, however as described hereafter the system in some embodiments be configured to receive object based audio signals.
- a channel based audio signal is one wherein the sound scene is represented by one or more audio signals which represent the audio signals generated by playback equipment located in the listeners domain.
- the sound scene is represented by one or more audio signals which represent the audio signals generated by playback equipment located in the listeners domain.
- a channel based audio signal may be a panned loudspeaker channel based system where the loudspeaker channels are in a 5.1 or other suitable channel format.
- An object based audio signal is one wherein the sound scene is represented by one or more sound sources each of which have an audio signal and a defined position within the sound scene (and which may be mapped to a position with respect to the listener).
- figure 1 shows input channel 1 (loudspeaker 1 ) 101 , input channel 2 (loudspeaker 2) 103, and input channel N (loudspeaker N) 105.
- the audio signals associated with the input channels can be passed to the spatially extending synthesiser 1 13 and to the mixer 1 1 1 .
- the system furthermore shows a spatially extending synthesizer 1 13.
- the spatial extending synthesiser 1 13 is configured to receive the input audio signals.
- the spatially extending synthesizer 1 13 is configured to receive the audio signals from the input channels (loudspeaker channels).
- the spatial extending synthesiser 1 13 is further configured to receive audio signal positional information. This may in the channel based example be the loudspeaker channel position information or may in the object based audio signal input example (as discussed in further detail later) may be positional information associated with the sound source represented by the audio signal.
- the spatial extending synthesiser 1 13 is further configured to receive a spatially extending control input.
- the spatially extending control input may be user input to assist in the operation of spatially extending the audio signals as discussed in further detail later.
- the spatial extending synthesiser 1 13 may be configured to spatially extend the audio signals and output the spatially extended input audio signal to the mixer 1 1 1 .
- the concept associated with embodiments described herein is that loudspeaker channels (or rendered channels) input to a spatially extending synthesizer spatially extends each channel to cover a certain area.
- each loudspeaker channel in a 4.0 reproduction setup may be spatially extended to cover an area of 90 degrees around its own position.
- the system comprises a mixer 1 1 1 or combiner.
- the mixer is configured to receive the input audio signals (shown in figure 1 as the loudspeaker input channel audio signals) and associated spatially extended audio signals from the spatially extending synthesizers 1 13.
- the combiner may be configured to combine the audio signals by selectively enabling either the non-extended signal or the spatially extended signal. This may be seen as an OR operation applied to the audio signals. For example, when the sources are moving a lot a spatially extended version may be used, whereas when there is little movement then the non-spatially extended version (original version) may be used. Any suitable method of combing may be used.
- the mixer 1 1 1 may furthermore be configured to receive a direct/extended control input (for example as shown in figure 1 also from the spatial extending synthesiser 1 13) configured to control the mix portions of the direct (or input channel) audio signal and the spatially extended audio signal.
- a direct/extended control input for example as shown in figure 1 also from the spatial extending synthesiser 1 13
- the spatial extending synthesiser 1 13 configured to control the mix portions of the direct (or input channel) audio signal and the spatially extended audio signal.
- the mixer 1 1 1 is in some embodiments configured to output each mixed audio signal to a suitable output.
- the mixer 1 1 1 shown in figure 1 is shown mixing the input channel 1 101 audio signal with the spatially extended audio signal channels.
- Input Channel 1 contains the input for one loudspeaker channel.
- the Spatially Extending Synthesizer When it is fed to the Spatially Extending Synthesizer, it is split to N output Channels.
- the Mixer mixes the original N channel signals with N channel signals which contain the outputs of the Spatially Extended Signals.
- the spatially extended version of Channel 1 is carried in N output Channels of the Spatially Extending Synthesizer, not just Channel 1 .
- Figures 2 and 3 both show an example 5.0 virtual loudspeaker configuration however any suitable number of (virtual) loudspeakers and any suitable configuration or arrangement of the loudspeakers may be implemented. Similarly any suitable number of audio signals may be employed.
- the input can also be a monophonic channel, which is then mixed to a maximum of two output channels by amplitude panning methods. When the monophonic channel source position is exactly at a position of a loudspeaker, it is emitted from a single output channel only. When the signal is in between loudspeakers, it is mixed to two output channels.
- the input signal in some embodiments is mixed to a maximum of three output channels (loudspeakers).
- the following example shows an input audio signal which is a channel based audio signal wherein there are 5 input channels and 5 output channels.
- the 5.0 loudspeaker system shown in figures 2 and 3 comprises a front right virtual loudspeaker channel 203, front centre virtual loudspeaker channel 209, front left virtual loudspeaker channel 205, a rear left virtual loudspeaker channel 207 and a rear right virtual loudspeaker channel 203. Furthermore with respect to the virtual loudspeakers is shown a listener position 213.
- the listener position 213 is the position at which a user or listener of system is positioned relative to the virtual loudspeaker channels.
- the user or listener is configured to be listening to the audio signals via a set of headphones.
- this system may be implemented with physical loudspeakers located in the listener's sound scene.
- the motion of the sound source is represented within figure 2, which shows an example whereby only direct audio signals (without any spatially extended audio signal components) are output and by the associated audio signal gain (or signal levels) from the front right virtual loudspeaker decreasing and the associated audio signal gain from the rear right virtual loudspeaker increasing.
- the listener may become aware of the 'loudspeakers' and thus distract from the listening experience.
- the listener may perceive a virtual sound object between the loudspeakers.
- the timbre of the panned source is different depending on its position between the loudspeakers. It is brightest when exactly in one loudspeaker and dullest when exactly in between the speakers.
- figure 3 shows loudspeaker channel audio signals being spatially widened or extended 321 .
- the embodiments enable playback not only the original sound but spatially extended versions of each loudspeaker channel.
- the energy is divided between the point source and the extended version of the audio signal.
- the spatially extending synthesiser 1 13 shown in figure 4 is one which is configured to be able to accept both channel based input audio signals and object based audio signals.
- the spatially extending synthesizer is configured to accept one or the other of the audio signal input formats and as such may only comprise the features or components required to process that audio signal input format.
- the spatially extending synthesiser 1 13 comprises an object/channel based signal determiner 1401 .
- the object/channel based signal determiner 1401 is configured to determine whether or not the input signals are channel based or object based. For example the audio signals shown in figure 1 are channel based.
- the object/channel based signal determiner 1401 may be configured to control the processing or outputting of the input audio signals based on the determination.
- the sound source or object position information can decoded from the input and be passed directly to a cross channel analyser 1405 and to an object to channel renderer 1403.
- the object position information may also in some embodiments be represented with side information (or metadata or the like), for example, with (azimuth, elevation, distance, timestamp) which indicate the position of that sound object as polar coordinates (or other co-ordinate systems) at a time indicated by timestamp.
- the audio signal can be passed to the cross-channel analyser 1405 and to a joint sound component determiner 1407.
- the spatially extending synthesiser 1 13 comprises an object to channel renderer 1403.
- the object to channel render 1403 is configured to receive the object or sound source based audio signals and render the audio signals to an output channel format suitable for spatially extending.
- the renderer is configured to apply a spatial mapping of the audio signal based on the positional information of the sound source or object.
- the output channel rendered audio signals can then be passed to a spatially extending processor 141 1 .
- the channel renderer 1403 function is implemented within a spatially extending synthesizer configured to receive a monophonic input (which splits the signal to N output channels) rather than the example shown where the spatially extending synthesizer is configured to receive input loudspeaker channels.
- the spatially extending synthesiser 1 13 comprises a joint sound component determiner 1407.
- the joint sound component determiner 1407 can be configured to receive the audio signals, which are channel based audio signals, and deternnine components of the audio signals which are common. These determined joint sound components can be passed to the cross channel analyser 1405.
- the spatially extending synthesiser 1 13 comprises a cross channel analyser 1405.
- the cross channel analyser 1405 can be configured to receive the audio signals and determine the amount of cross-channel movement. For example where the audio signals are channel based audio signals this may be determined by analysing the level changes of joint sound components between channels. Where the audio signals are object based audio signals then this may be determined by analysis of the sound source or object motion.
- the sound source/object position data may be represented as polar coordinates (azimuth, elevation, distance) or Cartesian coordinates (x, y, z), and a timestamp indicating the time to which the position corresponds to The analyser may analyse the position data to determine how much movement there is.
- the analyser is configured to determine the azimuth range of sound object positions across a certain time interval. If the azimuth range of a sound object exceeds a predetermined threshold in degrees (e.g. 10 degrees), then it is possible to determine that there is movement. The larger the range then the more movement there is.
- a predetermined threshold in degrees (e.g. 10 degrees)
- the spatial extent processing for channel signals may be enabled.
- the amount of movement may adjust the direct to extended ratio of the mixer: the more movement there is, the more gain is added to the spatially extended signal.
- the cross-channel analyser 1405 can be configured to output the results of the analysis to a spatially extending channel controller 1409.
- the specially extending synthesiser 1 13 comprises a spatially extending channel controller 1409.
- the spatially extending channel controller 1409 is configured to receive the output of the cross channel analyser 1405 and determine whether or not the motion of the cross channel component is sufficient to require a spatial extending of the audio signal.
- the controller in some embodiments is configured to determine specific spatially extending control signals to control the amount of spatially extending is to be applied by the spatially extending synthesiser/processor 141 1 based on the movement of the cross channel component.
- the controller in some embodiments is configured to determine control signals to control the mixer and thus control the amount of spatially extending audio signal is to be combined with the audio signal within the mixer 1 1 1 .
- the spatial extending synthesiser 1 13 comprises a spatial extending synthesiser/processor 141 1 .
- the spatially extending synthesiser/processor 141 1 is configured to receive the audio signal for spatially extending (for example from the object to channel renderer for an object based audio signal or from the input directly for a channel based audio signal) and furthermore control parameters for controlling the spatially extending from the spatially extending channel controller 1409.
- the spatially extending synthesiser/processor 141 1 may thus spatially extend the audio signal based on the control parameters and output a spatially extended audio signal to the mixer 1 1 1 .
- FIG 5 an example operation of the spatially extending synthesiser shown in figure 4 (and the mixer shown in figures 1 and 4) is shown by a flow diagram.
- the synthesizer shown in figure 4 is one suitable for receiving both object based and channel based inputs.
- a similar but pruned flow diagram may be implemented for a synthesizer configured to receive only one of the audio input formats.
- the input audio signals are determined to be either object or channel based as shown in figure 5 by step 501 .
- the operation is configured to determine joint sound components between channels as shown in figure 5 by step 507.
- the amount of cross-channel movement may then be determined by analysing the level changes of joint sound components between channels as shown in figure 5 by step 509.
- the amount of cross- channel movement may be determined by an analysis of the object or sound source position data as shown in figure 5 by step 505.
- next step is to determine spatially extending control parameters for the channels based on the amount of cross channel movement as shown in figure 5 by step 51 1 .
- the audio signals are object based than the audio signal objects are rendered to channel audio signals as shown in figure 5 by step 503.
- the spatial extending synthesis is applied to the channel based audio signals based on the control parameters as shown in figure 5 by step 515.
- the original or rendered audio signals (the direct audio signals) and the spatially extended audio signals are combined or mixed based on the control signals as shown in figure 5 by step 516.
- the mixed audio signal channels are then output as shown in figure 5 by step
- the joint sound component determiner comprises a time to frequency domain transformer such as a Short Time Fourier Transform (STFT).
- STFT Short Time Fourier Transform
- the time to frequency domain transformer 601 is configured to receive the input channel based audio signals and determine suitable frequency domain representations.
- the channel-based signal thus may be subjected to short-time discrete Fourier (STFT) analysis, using, for example, a temporal analysis window of 20 ms in length.
- STFT short-time discrete Fourier
- the frequency domain representations can be passed to a sub-band filter 603.
- the joint sound component determiner 1407 further comprises a sub-band filter configured to receive the frequency representations of the input audio signals and generate sub-band groups of the frequency domain representations.
- the sub-band filter thus may be configured to determine 32-frequency bands.
- the sub band filter may for example be configured to generate Equivalent rectangular bandwidth (ERB) determined frequency bands.
- ERB Equivalent rectangular bandwidth
- n b is the first index of £>th subband
- n is discrete frequency
- k is the channel index.
- the sub-band filter is configured to output these sub-band signals to a band-wise correlator 605.
- the joint sound component determiner 1407 comprises a band-wise correlator 605.
- the band-wise correlator 605 may be configured to band wise correlate (neighbouring) channel audio signals to determine the level of correlation between these audio signals.
- the output of the band wise correlator 605 can be configured to be output to a joint sound component analyser 607.
- the joint sound component determiner comprises a joint sound component analyser 607.
- the joint sound component analyser 607 is configured to compare the band-wise correlation outputs to a determined threshold value to determine whether or not there are joint sound components within the audio signals with sufficient similarity which may be used to determine motion within the neighbouring channels.
- the band-wise correlator is configured to find a delay r b that maximizes the correlation between two channels for sub-band b. This can be accomplished by creating time-shifted versions of the signal for a channel (e.g. in channel 2), and correlating these with the signal for another channel (e.g. on channel 3).
- a time shift of ⁇ time domain samples of X (n) can be obtained as
- X k b ,Tb in) X k b (n)e- j —.
- the optimal delay r b is obtained from max Re ⁇ xl Tb * X ) , T b E [-D max' where Re indicates the real part of the result and * denotes combined transpose and complex conjugate operations. and X$ are considered vectors with length of n b+1 - n b samples.
- the range of searching for the delay D max is selected such that it covers the expected time differences between loudspeaker channels depending on the setup.
- the band b on channels 2 and 3 may be determined to contain the same content.
- the results of the analysis in some embodiments is output to the cross channel analyser 1405 and a gain change determiner.
- the cross channel analyser 1405 comprises a gain change determiner 61 1 .
- the gain change determiner may be configured to compare the joint sound components configured to determine the change of signal levels between the joint sound components in other words to determine whether or not the audio source is moving from one channel to another channel by analysing the signal levels of the joint sound components.
- the movement of sound between loudspeakers channels can be determined by observing amplitude panning type behaviour of frequency bands in the channel input.
- band b was determined to contain similar content for channels 2 and 3
- the system may continue to monitor the level (energy) of band b in channels 2 and 3 over a certain number of processing frames (for example, 10 frames). If during this time it is observed that the energy reduces on one channel and increases on the other, the content of that band is moving across the channels. The more frequency bands are moving simultaneously, the more movement there is in the sound scene.
- the spatially extending synthesiser receives the original or rendered channel audio signals and spatially extends the audio signals to a defined spatial extent based on the spatially extending control parameters such as those generated by the spatially extending channel controller.
- the synthesizer takes as input a mono sound source audio signal and spatially extending parameters (width, height and depth).
- the spatially extending synthesiser comprises a suitable time to frequency domain transformer.
- the spatially extending synthesiser comprises a Short-Time Fourier Transform (STFT) configured to receive the audio signal and output a suitable frequency domain output.
- the input is a time-domain signal which is processed with hop-size of 512 samples.
- a processing frame of 1024 samples is used, and it is formed from the current 512 samples and previous 512 samples.
- the processing frame is zero-padded to twice its length (2048 samples) and Hann windowed.
- the Fourier transform is calculated from the windowed frame producing the Short-Time Fourier Transform (STFT) output.
- STFT Short-Time Fourier Transform
- the STFT output is symmetric, thus it is sufficient to process the positive half of 1024 samples including the DC component, totalling 1025 samples.
- any suitable time to frequency domain transform may be used.
- the spatially extending synthesiser further comprises a filter bank 403.
- the filter bank 403 is configured to receive the output of the STFT 401 and using a set of filters generated based on a Halton sequence (and with some default parameters) generate a number of frequency bands 405.
- Halton sequences are sequences used to generate points in space for numerical methods such as Monte Carlo simulations. Although these sequences are deterministic, they are of low discrepancy, that is, appear to be random for many purposes.
- the filter bank 409 comprises set of 9 different distribution filters, which are used to create 9 different frequency domain signals where the signals do not contain overlapping frequency components. These signals are denoted Band 1 F 405i to Band 9 F 405g in figure 7.
- the filtering can be implemented in the frequency domain by multiplying the STFT output with stored filter coefficients for each band.
- the spatially extending synthesiser further comprises a spatially extending input 400.
- the spatially extending input 400 may be configured to define the spatially extending of the audio signal.
- the spatially extending synthesiser may further comprise an object/channel position input/determiner 402.
- the object/channel position input/determiner 402 may be configured to determine the spatial position of the sound sources.
- the spatially extending synthesiser may further comprise a band position determiner 404.
- the band position determiner 404 may be configured to receive the outputs from the channel object position input/determiner 402 and the spatially extending input 400 and from these generate an output passed to the vector base amplitude panning processor 406.
- the spatially extending synthesiser may further comprise a vector base amplitude panning (VBAP) processor 406.
- the VBAP 406 may be configured to generate control signals to control the panning of the frequency domain signals to desired spatial positions. Given the spatial position of the sound source (azimuth, elevation) and the desired spatially extending for the source (width in degrees), the system calculates a spatial position for each frequency domain signal. For example, if the spatial position of the sound source is zero degrees azimuth (front), and spatially extending 90 degrees, the VBAP may position the frequency bands at positions azimuth 45, 33.75, 22.5, 1 1 .25, 0, -1 1 .2500, -22.5000, -33.7500, -45 degrees.
- the span of the bands might not exactly match with the desired spatial extent but may be smaller or larger.
- models of human sound perception may be used to compensate for the difference in perceived to synthesized spatial extent; in particular in cases where humans may perceive sound sources narrower than they actually are rendered.
- Spatial extent in elevation (height) domain may be performed in the same manner as for the azimuth (width) domain as above.
- Spatially extending in the depth domain may also be performed in some embodiments by rendering some bands at different depths using known methods for sound distance rendering.
- Methods for sound distance rendering include, for example, adding distance attenuation, sound with sounds further away being quieter.
- the distance attenuation may be implemented with the 1/r-rule where r is the distance.
- the distance rendering may be performed by modifying the direct to reverberant ratio of the sound. If there is any reverberation present, reverberation energy usually is constant but the direct signal energy becomes smaller.
- distance rendering may be performed by attenuating early reflections in reverberation to become more quiet and sparse with the increasing distance.
- adding a low-pass filter to attenuate high frequencies can be implemented to approximate the effect of the attenuation of higher frequency components when the distance increases.
- the VBAP processor 406 may therefore be used to calculate a suitable gain for the signal, given the desired loudspeaker positions.
- VBAP processor 406 may provide gains for a signal such that it can be spatially positioned to a suitable position. These gains may be passed to a series of multipliers 407.
- the spatial extent synthesiser or spatially extending controller
- the spatial extent synthesis or spatially extending control may be implementation agnostic and any suitable implementation used to generate the spatially extending control.
- the spatially extending control may implement direct binaural panning (using Head related transfer function filters for directions), direct assignment to the output channel locations (for example direct assignment to the loudspeakers without using any panning), synthesized ambisonics, and wave-field synthesis.
- the spatially extending synthesiser may further comprise a series of multipliers 407.
- the series of multipliers comprise multipliers 407i to 407g, however any suitable number of multipliers may be used.
- Each frequency domain band signal may be multiplied in the multiplier 407 with the determined VBAP gains.
- the products of the VBAP gains and each frequency band signal may be passed to a series of output channel sum devices 409.
- the spatially extending synthesiser 215 may further comprise a series of sum devices 409.
- the sum devices 409 may receive the outputs from the multipliers and combine them to generate an output channel band signal 41 1 .
- a 4.0 loudspeaker format output is implemented with outputs for front left (Band FL F 41 1 i), front right (Band FR F 41 1 2 ), rear left (Band RL F 41 13), and rear right (Band RR F 41 1 4 ) channels which are generated by sum devices 409i, 4092, 4093 409 4 respectively.
- other loudspeaker formats or number of channels can be supported.
- panning methods such as panning laws, or the signals could be assigned to the closest loudspeakers directly.
- the spatially extending synthesiser may further comprise a series of inverse Short-Time Fourier Transforms (ISTFT) 413.
- ISTFT inverse Short-Time Fourier Transforms
- FIG. 7 there is an ISTFT 413i associated with the FL signal an ISTFT 4132 associated with the FR signal, an ISTFT 4133 associated with the RL signal output and an ISTFT 413 4 associated with the RR signal.
- ISTFT Inverse Short-Time Fourier Transform
- component signals may be provided for rendering and also for analysis for the purpose of ensuring even energy distributions between the components.
- each loudspeaker or channel audio signal may be selectively spatially extended.
- all of the loudspeaker channels except the centre channel may be processed and the centre channel is unprocessed since it often contains speech (e.g., movie 5.1 mixes).
- the channel processing may be controlled based on a speech or voice activity detector analysis of the audio signal.
- the audio signal comprises for the centre channel (or any other channel) mostly speech. If the audio signal is mostly speech, then that channel may receive a smaller spatial extent value or not be spatially extended at all.
- all channels or loudspeaker audio signals are summed together and this audio signal is then spatially widened to 360 degrees (or other suitable extent) before combining this to the original loudspeaker audio signals.
- the user may be able to use the spatially extending control input to affect the amount of spatially extending present in the output signal. This may allow a tuning of the reproduction between normal, almost point-like, audio signals and widely spread signals.
- These embodiments may be implemented when the user starts to listen to a sound scene, which contains a sound source or object moving around him. Firstly, the user may listen to the sound scene with the method disabled (no spatial extending applied). As the sound object moves around the user, the user hears its sound changing depending on whether the sound comes exactly from the direction of a loudspeaker or from between two loudspeakers. The sound sounds a bit dull between speakers, and it also sounds a bit larger. At the location of a speaker, the sound is sharp, clear, and narrow. The user may not be fully convinced about the reproduction quality and does not feel fully immersed. The user may then enable the proposed processing (or in other embodiments be enabled automatically).
- the user After the processing is enabled, the user would experience an increased immersion as the sound somehow starts to fill the whole room. Moreover, as the sound moves around the user would no longer hear timbral or spatial spread changes, the only thing which changes is the object's spatial position. The user is likely to be much happy with the sound quality.
- the device may be any suitable electronics device or apparatus.
- the device 1200 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
- the device 1200 may comprise a microphone 1201 .
- the microphone 1201 may comprise a plurality (for example a number N) of microphones. However it is understood that there may be any suitable configuration of microphones and any suitable number of microphones.
- the microphone 1201 is separate from the apparatus and the audio signal transmitted to the apparatus by a wired or wireless coupling.
- the microphone 1201 may in some embodiments be the microphone array as shown in the previous figures.
- the microphone may be a transducer configured to convert acoustic waves into suitable electrical audio signals.
- the microphone can be solid state microphones. In other words the microphone may be capable of capturing audio signals and outputting a suitable digital format signal.
- the microphone 1201 can comprise any suitable microphone or audio capture means, for example a condenser microphone, capacitor microphone, electrostatic microphone, Electret condenser microphone, dynamic microphone, ribbon microphone, carbon microphone, piezoelectric microphone, or microelectrical-mechanical system (MEMS) microphone.
- the microphone can in some embodiments output the audio captured signal to an analogue-to-digital converter (ADC) 1203.
- ADC analogue-to-digital converter
- the device 1200 may further comprise an analogue-to-digital converter 1203.
- the analogue-to-digital converter 1203 may be configured to receive the audio signals from each of the microphone 1201 and convert them into a format suitable for processing. In some embodiments where the microphone is an integrated microphone the analogue-to-digital converter is not required.
- the analogue-to-digital converter 1203 can be any suitable analogue-to-digital conversion or processing means.
- the analogue- to-digital converter 1203 may be configured to output the digital representations of the audio signal to a processor 1207 or to a memory 121 1 .
- the device 1200 comprises at least one processor or central processing unit 1207.
- the processor 1207 can be configured to execute various program codes such as the methods such as described herein.
- the device 1200 comprises a memory 121 1 .
- the at least one processor 1207 is coupled to the memory 121 1 .
- the memory 121 1 can be any suitable storage means.
- the memory 121 1 comprises a program code section for storing program codes implementable upon the processor 1207.
- the memory 121 1 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1207 whenever needed via the memory-processor coupling.
- the device 1200 comprises a user interface 1205.
- the user interface 1205 can be coupled in some embodiments to the processor 1207.
- the processor 1207 can control the operation of the user interface 1205 and receive inputs from the user interface 1205.
- the user interface 1205 can enable a user to input commands to the device 1200, for example via a keypad.
- the user interface 205 can enable the user to obtain information from the device 1200.
- the user interface 1205 may comprise a display configured to display information from the device 1200 to the user.
- the user interface 1205 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1200 and further displaying information to the user of the device 1200.
- the user interface 1205 may be the user interface for communicating with the position determiner as described herein.
- the device 1200 comprises a transceiver 1209.
- the transceiver 1209 in such embodiments can be coupled to the processor 1207 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
- the transceiver 1209 or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
- the transceiver 1209 may be configured to communicate with the renderer as described herein.
- the transceiver 1209 can communicate with further apparatus by any suitable known communications protocol.
- the transceiver 1209 or transceiver means can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
- UMTS universal mobile telecommunications system
- WLAN wireless local area network
- IRDA infrared data communication pathway
- the device 1200 may be employed as at least part of the renderer.
- the transceiver 1209 may be configured to receive the audio signals and positional information from the microphone/close microphones/position determiner as described herein, and generate a suitable audio signal rendering by using the processor 1207 executing suitable code.
- the device 1200 may comprise a digital-to- analogue converter 1213.
- the digital-to-analogue converter 1213 may be coupled to the processor 1207 and/or memory 121 1 and be configured to convert digital representations of audio signals (such as from the processor 1207 following an audio rendering of the audio signals as described herein) to a suitable analogue format suitable for presentation via an audio subsystem output.
- the digital-to-analogue converter (DAC) 1213 or signal processing means can in some embodiments be any suitable DAC technology.
- the device 1200 can comprise in some embodiments an audio subsystem output 1215.
- An example as shown in Figure 8 shows the audio subsystem output 1215 as an output socket configured to enabling a coupling with headphones 121 .
- the audio subsystem output 1215 may be any suitable audio output or a connection to an audio output.
- the audio subsystem output 1215 may be a connection to a multichannel speaker system.
- the digital to analogue converter 1213 and audio subsystem 1215 may be implemented within a physically separate output device.
- the DAC 1213 and audio subsystem 1215 may be implemented as cordless earphones communicating with the device 1200 via the transceiver 1209.
- the device 1200 is shown having both audio capture, audio processing and audio rendering components, it would be understood that in some embodiments the device 1200 can comprise just some of the elements.
- the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
- some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
- the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
- any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
- the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
- the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
- the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
- Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
- the design of integrated circuits is by and large a highly automated process.
- Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
- Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
- the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Abstract
Appareil pour produire au moins deux signaux audio de canal, chaque canal étant associé à une position de canal dans une scène sonore comprenant au moins une source sonore, l'appareil étant conçu pour : recevoir et/ou déterminer au moins un signal audio de canal associé à la ou aux sources sonores ; produire au moins un signal audio spatialement étendu sur la base du ou des signaux audio de canal ; combiner le ou les signaux audio de canal et le ou les signaux audio de canal spatialement étendus pour produire au moins deux signaux audio de canal de sortie.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18787484.7A EP3613221A4 (fr) | 2017-04-20 | 2018-04-19 | Amélioration de lecture de haut-parleur à l'aide d'un signal audio traité en étendue spatiale |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1706288.6A GB2565747A (en) | 2017-04-20 | 2017-04-20 | Enhancing loudspeaker playback using a spatial extent processed audio signal |
GB1706288.6 | 2017-04-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018193163A1 true WO2018193163A1 (fr) | 2018-10-25 |
Family
ID=58795837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FI2018/050277 WO2018193163A1 (fr) | 2017-04-20 | 2018-04-19 | Amélioration de lecture de haut-parleur à l'aide d'un signal audio traité en étendue spatiale |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3613221A4 (fr) |
GB (1) | GB2565747A (fr) |
WO (1) | WO2018193163A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110267064A (zh) * | 2019-06-12 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | 音频播放状态处理方法、装置、设备及存储介质 |
GB2579348A (en) * | 2018-11-16 | 2020-06-24 | Nokia Technologies Oy | Audio processing |
WO2020152394A1 (fr) * | 2019-01-22 | 2020-07-30 | Nokia Technologies Oy | Représentation audio et rendu associé |
CN114866948A (zh) * | 2022-04-26 | 2022-08-05 | 北京奇艺世纪科技有限公司 | 一种音频处理方法、装置、电子设备和可读存储介质 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023131398A1 (fr) * | 2022-01-04 | 2023-07-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Appareil et procédé de mise en œuvre d'un rendu d'objet audio polyvalent |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150223002A1 (en) * | 2012-08-31 | 2015-08-06 | Dolby Laboratories Licensing Corporation | System for Rendering and Playback of Object Based Audio in Various Listening Environments |
US20160205491A1 (en) * | 2013-08-20 | 2016-07-14 | Harman Becker Automotive Systems Manufacturing Kft | A system for and a method of generating sound |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2343347B (en) * | 1998-06-20 | 2002-12-31 | Central Research Lab Ltd | A method of synthesising an audio signal |
US8488796B2 (en) * | 2006-08-08 | 2013-07-16 | Creative Technology Ltd | 3D audio renderer |
US20080298610A1 (en) * | 2007-05-30 | 2008-12-04 | Nokia Corporation | Parameter Space Re-Panning for Spatial Audio |
EP2154911A1 (fr) * | 2008-08-13 | 2010-02-17 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Appareil pour déterminer un signal audio multi-canal de sortie spatiale |
KR101844336B1 (ko) * | 2011-08-01 | 2018-04-02 | 삼성전자주식회사 | 공간감을 제공하는 신호 처리 장치 및 신호 처리 방법 |
-
2017
- 2017-04-20 GB GB1706288.6A patent/GB2565747A/en not_active Withdrawn
-
2018
- 2018-04-19 EP EP18787484.7A patent/EP3613221A4/fr active Pending
- 2018-04-19 WO PCT/FI2018/050277 patent/WO2018193163A1/fr unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150223002A1 (en) * | 2012-08-31 | 2015-08-06 | Dolby Laboratories Licensing Corporation | System for Rendering and Playback of Object Based Audio in Various Listening Environments |
US20160205491A1 (en) * | 2013-08-20 | 2016-07-14 | Harman Becker Automotive Systems Manufacturing Kft | A system for and a method of generating sound |
Non-Patent Citations (1)
Title |
---|
See also references of EP3613221A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2579348A (en) * | 2018-11-16 | 2020-06-24 | Nokia Technologies Oy | Audio processing |
WO2020152394A1 (fr) * | 2019-01-22 | 2020-07-30 | Nokia Technologies Oy | Représentation audio et rendu associé |
CN110267064A (zh) * | 2019-06-12 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | 音频播放状态处理方法、装置、设备及存储介质 |
CN110267064B (zh) * | 2019-06-12 | 2021-11-12 | 百度在线网络技术(北京)有限公司 | 音频播放状态处理方法、装置、设备及存储介质 |
CN114866948A (zh) * | 2022-04-26 | 2022-08-05 | 北京奇艺世纪科技有限公司 | 一种音频处理方法、装置、电子设备和可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
GB2565747A (en) | 2019-02-27 |
EP3613221A4 (fr) | 2021-01-13 |
EP3613221A1 (fr) | 2020-02-26 |
GB201706288D0 (en) | 2017-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6818841B2 (ja) | 少なくとも一つのフィードバック遅延ネットワークを使ったマルチチャネル・オーディオに応答したバイノーラル・オーディオの生成 | |
US11212638B2 (en) | Generating binaural audio in response to multi-channel audio using at least one feedback delay network | |
EP3311593B1 (fr) | Reproduction audio binaurale | |
KR101341523B1 (ko) | 스테레오 신호들로부터 멀티 채널 오디오 신호들을생성하는 방법 | |
JP4921470B2 (ja) | 頭部伝達関数を表すパラメータを生成及び処理する方法及び装置 | |
JP5955862B2 (ja) | 没入型オーディオ・レンダリング・システム | |
EP3122073B1 (fr) | Méthode et appareil de traitement de signal audio | |
EP3613221A1 (fr) | Amélioration de lecture de haut-parleur à l'aide d'un signal audio traité en étendue spatiale | |
US10531216B2 (en) | Synthesis of signals for immersive audio playback | |
CN113170271B (zh) | 用于处理立体声信号的方法和装置 | |
US10652686B2 (en) | Method of improving localization of surround sound | |
US11337020B2 (en) | Controlling rendering of a spatial audio scene | |
EP2484127B1 (fr) | Procédé, logiciel, et appareil pour traitement de signaux audio | |
JP2022502872A (ja) | 低音マネジメントのための方法及び装置 | |
WO2018193162A2 (fr) | Génération de signal audio pour mixage audio spatial | |
US11388540B2 (en) | Method for acoustically rendering the size of a sound source | |
EP3488623A1 (fr) | Regroupement d'objets audio sur la base d'une différence de perception sensible au dispositif de rendu | |
US11924623B2 (en) | Object-based audio spatializer | |
WO2018193160A1 (fr) | Génération d'ambiance pour mélange audio spatial comprenant l'utilisation de signal original et étendu | |
US11665498B2 (en) | Object-based audio spatializer | |
WO2018193161A1 (fr) | Extension spatiale dans le domaine d'élévation par extension spectrale | |
CN109121067B (zh) | 多声道响度均衡方法和设备 | |
US20240284132A1 (en) | Apparatus, Method or Computer Program for Synthesizing a Spatially Extended Sound Source Using Variance or Covariance Data | |
KR20060026234A (ko) | 입체 음향 재생 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18787484 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018787484 Country of ref document: EP Effective date: 20191120 |