EP3461149A1

EP3461149A1 - An apparatus and associated methods for audio presented as spatial audio

Info

Publication number: EP3461149A1
Application number: EP17192113.3A
Authority: EP
Inventors: Sujeet Shyamsundar Mate; Lasse Laaksonen; Arto Lehtiniemi; Francesco Cricri
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2019-03-27
Also published as: WO2019057530A1

Abstract

An apparatus which, based on spatial audio content comprising at least one audio track comprising audio for audible presentation as spatial audio perceived to originate from a first audio-object location in a virtual space relative to a user location, and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance; is configured to change the audible presentation of the first audio track to the user from spatial audio to monophonic/stereophonic audio for audio mixing of said first audio track, and wherein said presentation as monophonic/stereophonic audio is maintained irrespective of relative movement causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.

Description

Technical Field

The present disclosure relates to the field of spatial audio and, in particular, to the field of providing for audio mixing of spatial audio in a virtual space, associated methods, computer programs and apparatus.

Background

The augmentation of real-world environments with graphics and audio is becoming common, with augmented/virtual reality content creators providing more and more content for augmentation of the real-world as well as for virtual environments. The presentation of audio as spatial audio, which is such that that the audio is perceived to originate from a particular location in space, is useful for creating realistic augmented reality environments and virtual reality environments. The effective and efficient control of the presentation of spatial audio may be challenging.
The listing or discussion of a prior-published document or any background in this specification should not necessarily be taken as an acknowledgement that the document or background is part of the state of the art or is common general knowledge. One or more aspects/examples of the present disclosure may or may not address one or more of the background issues.

Summary

In a first example aspect there is provided an apparatus comprising:

at least one processor; and
at least one memory including computer program code,
the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:
- based on spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a first audio-object location in a virtual space relative to a user location of the user in the virtual space, and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance;
- provide for a change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track, and wherein said presentation of said first audio track as at least one of monophonic and stereophonic audio is maintained irrespective of relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.

In one or more examples, based on the relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond a predetermined-stretched-bubble-distance which is greater than the predetermined-bubble-distance;

provide for a change in the audible presentation of the first audio track to the user from presentation as at least one of monophonic and stereophonic audio to presentation as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location.

Accordingly, in one or more examples, relative movement between the audio-object location and the user location while the distance remains below the predetermined-stretched-bubble-distance does not affect the audible presentation of the first audio track in terms of one or more of the direction from which the audio of the first audio track is heard, the volume of the audio of the first audio track, application of a Room Impulse Response function to the first audio track, and the degree to which the audio of the first audio track is presented to left and right channels of an audio presentation device configured to provide the audio presented on the channels to left and right ears of the user.
In one or more examples, one or both of:

i) user initiated movement of their user location in the virtual space; and
ii) movement of the audio-object in the virtual space;

In one or more examples, the spatial audio content comprises a plurality of audio tracks including at least the first audio track and a second audio track, the second audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a second audio-object location in the virtual space; and

based on the presentation of said first audio track as at least one of monophonic and stereophonic audio, provide for audible presentation of the second audio track as spatial audio such that the audio of the second audio track is perceived from a direction based on the second audio-object location relative to the user location.

In one or more examples, the apparatus is caused to provide for audible presentation of the spatial audio content and simultaneous audible presentation of ambience audio content, the ambience audio content comprising audio for audible presentation that is not perceived to originate from a particular direction in the virtual space. In one or more examples, the ambience audio content presented to the user is a function of location in the virtual space, the apparatus caused to provide for one or more of:

i) said simultaneous audible presentation of ambience audio content based on the user-location;
ii) said simultaneous audible presentation of ambience audio content based on the first audio object location when providing for presentation of the audio the first audio object monophonically or stereophonically;
iii) ceasing of presentation of ambience audio content. when providing for presentation of the audio the first audio object monophonically or stereophonically.

In one or more examples, the spatial audio content comprises part of virtual reality content for viewing in virtual reality, the virtual reality content comprising visual content for display in the three-dimensional virtual space and the one or more audio tracks of the spatial audio content configured for presentation from one or more respective audio-object locations, at least a subset of the audio-object locations corresponding to features in the visual content.
In one or more examples, the spatial audio content comprises augmented reality content, the virtual space corresponding to a real-world space in which the user is located such that a location of the user in the real-world space corresponds to the user location in the virtual space and the audio-object location in the virtual space corresponds to a real-world-audio-object location in the real-world space.
In one or more examples, the spatial audio content comprises mixed reality content, the virtual space corresponding to a real-world space in which the user is located such that a location of the user in the real-world space corresponds to the user location in the virtual space and the audio-object location in the virtual space corresponds to a real-world-audio-object location in the real-world space.
In one or more examples, with the first audio track audibly presented as one of monophonic and stereophonic audio, based on user input indicative of a desire to return to spatial audio presentation of the first audio track, provide for a change in the audible presentation of the first audio track to the user from presentation as at least one of monophonic and stereophonic audio to presentation as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location.
In one or more examples, the direction is based on the first audio-object location at the time of said user input or a current first audio-object location.
In one or more examples, in addition to the apparatus being caused to provide for a change in the audible presentation of the first audio track, the apparatus is caused to provide an audio mixing user interface for modification of one or more audio parameters of the first audio track.
In one or more examples, the one or more audio parameters comprise at least one or more of: volume, bass level, mid-tone level, treble-level, reverberation level and echo.
In one or more examples, based on receipt of user input to the audio mixing user interface causing changes to one or more audio parameters of the first audio track and, subsequently, at least one of:

i) relative movement between the user location and the audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond a predetermined-stretched-bubble-distance which is greater than the predetermined-bubble-distance;
ii) user input indicative of a desire to return to spatial audio presentation of the first audio track;

a) discarding of the changes to one or more audio parameters of the first audio track unless a user initiated save input is received; and
b) a change in the audible presentation of the first audio track to presentation as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location with the changes to the one or more audio parameters applied.

In one or more examples, between the audible presentation of the first audio track to the user as at least one of monophonic and stereophonic audio and the presentation of the first audio track as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location, the apparatus is caused to provide for audible presentation of the first audio track with a transitional spatial audio effect comprising the perceived origin of the audio of first audio track progressively moving away from the user location to the current first audio-object location.
In one or more examples, based on the relative movement between the user location and the audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase to within a threshold of the predetermined-stretched-bubble-distance, provide for audible presentation of the first audio track to the user as at least one of monophonic and stereophonic audio with an audio effect to thereby audibly indicate that the user is approaching the predetermined-stretched-bubble-distance.
In one or more examples, when presenting one of a plurality of audio tracks of the spatial audio content monophonically or stereophonically with the other audio tracks presented as spatial audio, the apparatus is caused to apply a room impulse response function to at least one of said other audio tracks, the room impulse response function configured to modify the audio of the at least one other audio tracks to sound as if it is heard in a predetermined room with a particular location in said predetermined room, the particular location in said predetermined room based on either:

i) the user location; and
ii) a current location of the first audio object.

In one or more examples, the apparatus is caused to provide for user-selection of one of:

a) the Room Impulse Response function based on the user location; and
b) the Room Impulse Response function based on the current location of the first audio object.

In one or more examples, the spatial audio content is audibly presented as spatial audio by processing the first audio track using one or more of:

i) a head-related-transfer-function filtering technique;
ii) a vector-base-amplitude panning technique; and
iii) binaural audio presentation.

In one or more examples, when the first audio track is presented as spatial audio, signalling indicative of movement of the user provides for modification of one or more of the direction from which the audio track is perceived to originate relative to the user's head and its volume; and

when the first audio track is presented as at least one of monophonic and stereophonic audio, signalling indicative of movement of the user does not provide for modification of one or more of the direction from which the audio track is perceived to originate relative to the user's head and its volume.

In a further aspect there is provided a method, the method comprising:

based on spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a first audio-object location in a virtual space relative to a user location of the user in the virtual space, and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance;
providing for a change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track, and wherein said presentation of said first audio track as at least one of monophonic and stereophonic audio is maintained irrespective of relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.

In a further aspect there is provided a computer readable medium comprising computer program code stored thereon, the computer readable medium and computer program code being configured to, when run on at least one processor, perform the method of:

In a further aspect there is provided an apparatus, the apparatus comprising means configured to;

based on spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a first audio-object location in a virtual space relative to a user location of the user in the virtual space, and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance;
provide for a change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track, and wherein said presentation of said first audio track as at least one of monophonic and stereophonic audio is maintained irrespective of relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.

The present disclosure includes one or more corresponding aspects, examples or features in isolation or in various combinations whether or not specifically stated (including claimed) in that combination or in isolation. Corresponding means and corresponding functional units (e.g., function enabler, AR/VR graphic renderer, display device) for performing one or more of the discussed functions are also within the present disclosure.
Corresponding computer programs for implementing one or more of the methods disclosed are also within the present disclosure and encompassed by one or more of the described examples.
The above summary is intended to be merely exemplary and non-limiting.

Brief Description of the Figures

A description is now given, by way of example only, with reference to the accompanying drawings, in which:

figure 1 illustrates an example apparatus for providing for a change in the audible presentation of an audio track;
figure 2 shows an example virtual space showing the location of a user and five audio objects each having an audio track associated therewith and the location relative to the user illustrating where in the virtual space the user perceives the audio tracks as originating;
figure 3 shows the example virtual space as figure 2 with a bubble around each audio object illustrating the predetermined-bubble-distance;
figure 4 shows the user approaching the predetermined-bubble-distance of one of the audio objects;
figure 5 shows the user having moved such that the distance between the user's location and the one of the audio-object locations has decreased to less than the predetermined-bubble-distance and the apparatus has provided for a change in the audible presentation of the first audio track to the user;
figure 6 shows the situation illustrated in figure 5 except the audio object has moved in the virtual space relative to the user;
figure 7 shows the situation illustrated in figure 5 except the user has moved relative to the audio object in the virtual space;
figure 8 shows an example application of room impulse response processing to the audio track;
figure 9 shows an example in which relative movement of the user and the audio object away from one another has caused the distance between them to approach a predetermined-stretched-bubble-distance;
figure 10 shows an example in which the user and audio object have moved apart beyond the stretched-bubble-distance and the apparatus provides for a change in presentation of the first audio track back to externalized (relative to the user's head) spatial audio rather than non-externalized monophonic/stereophonic audio;
figure 11 shows a flowchart illustrating an example method; and
figure 12 shows a computer readable medium.

Description of Example Aspects

Virtual reality (VR) may use a VR display comprising a headset, such as glasses or goggles or virtual retinal display, or one or more display screens that surround a user to provide the user with an immersive virtual experience. A virtual reality apparatus, which may or may not include the VR display, may provide for presentation of multimedia VR content representative of a virtual reality scene to a user to simulate the user being present within the virtual reality scene. Accordingly, in one or more examples, the VR apparatus may provide signalling to a VR display for display of the VR content to a user while in one or more other examples, the VR apparatus may be part of the VR display, e.g. part of the headset. The virtual reality scene may therefore comprise the VR content displayed within a three-dimensional virtual reality space so that the user feels immersed in the scene, as if they were there, and may look around the VR space at the VR content displayed around them. The virtual reality scene may replicate a real world scene to simulate the user being physically present at a real world location or the virtual reality scene may be computer generated or a combination of computer generated and real world multimedia content. Thus, the VR content may be considered to comprise the imagery (e.g. static or video imagery), audio and/or accompanying data from which a virtual reality scene may be generated for display. The VR apparatus may therefore provide the VR scene by generating the virtual, three-dimensional, VR space in which to display the VR content. The virtual reality scene may be provided by a panoramic video (such as a panoramic live broadcast), comprising a video having a wide or 360° field of view (or more, such as above and/or below a horizontally oriented field of view). A panoramic video may have a wide field of view in that it has a spatial extent greater than a field of view of a user or greater than a field of view with which the panoramic video is intended to be displayed.
The VR content provided to the user may comprise live or recorded images of the real world, captured by a VR content capture device, for example. An example VR content capture device comprises a Nokia Technologies OZO device. As the VR scene is typically larger than a portion a user can view with the VR display, the VR apparatus may provide, for display on the VR display, a virtual reality view of the VR scene to a user, the VR view showing only a spatial portion of the VR content that is viewable at any one time. The VR apparatus may provide for panning around of the VR view in the VR scene based on movement of a user's head and/or eyes. A VR content capture device may be configured to capture VR content for display to one or more users. A VR content capture device may comprise one or more cameras and, optionally, one or more (e.g. directional) microphones configured to capture the surrounding visual and aural scene from a capture point of view. In some examples, the VR content capture device comprises multiple, physically separate cameras and/or microphones. Thus, a musical performance may be captured (and recorded) using a VR content capture device, which may be placed on stage, with the performers moving around it or from the point of view of an audience member. In each case a consumer of the VR content may be able to look around using the VR display of the VR apparatus to experience the performance at the capture location as if they were present.
Augmented reality (AR) may use an AR display, such as glasses or goggles or a virtual retinal display, to augment a view of the real world (such as seen through the glasses or goggles) with computer generated content. An augmented reality apparatus, which may or may not include an AR display, may provide for presentation of multimedia AR content configured to be overlaid over the user's view of the real-world. Thus, a user of augmented reality may be able to view the real world environment around them, which is augmented or supplemented with content provided by the augmented reality apparatus, which may be overlaid on their view of the real world and/or aurally overlaid over an aural real world scene they can hear. The content may comprise multimedia content such as pictures, photographs, video, diagrams, textual information, aural content among others. Thus, while augmented reality may provide for direct viewing of the real world with the addition of computer generated graphics and/or audio content, a user of virtual reality may only be able to see content presented on the VR display of the virtual reality apparatus substantially without direct viewing of the real world.
In addition to the audio received from the microphone(s) of the VR content capture device further microphones each associated with a distinct audio source may be provided. In one or more examples, the VR content capture device may not have microphones and the aural scene may be captured by microphones remote from the VR content capture device. Thus, microphones may be provided at one or more locations within the real world scene captured by the VR content capture device, each configured to capture audio from a distinct audio source. For example, using the musical performance example, a musical performer or a presenter may have a personal microphone. Knowledge of the location of each distinct audio source may be obtained by using transmitters/receivers or identification tags to track the position of the audio sources, such as relative to the VR content capture device, in the scene captured by the VR content capture device. Thus, the VR content may comprise the visual imagery captured by one or more VR content capture devices and the audio captured by the one or more VR content capture devices and, optionally/alternatively, one or more further microphones. The location of the further microphones may be provided for providing spatial audio.
The virtual reality content may comprise, and a VR apparatus presenting said VR content may provide, predefined-viewing-location VR or free-viewing-location VR. In predefined-viewing-location VR, the location of the user in the virtual reality space may be fixed or follow a predefined path. Accordingly, a user may be free to change their viewing direction with respect to the virtual reality imagery provided for display around them in the virtual reality space, but they may not be free to arbitrarily change their viewing location in the VR space to explore the VR space. Thus, the user may experience such VR content from a fixed point of view or viewing location (or a limited number of locations based on where the VR content capture devices were located in the scene). In some examples of predefined-viewing-location VR the imagery may be considered to move past them. In predefined-viewing-location VR content captured of the real world, the user may be provided with the point of view of the VR content capture device. Predefined-viewing-location VR content may provide the user with three degrees of freedom in the VR space comprising rotation of the viewing direction around any one of x, y and z axes and may therefore be known as three degrees of freedom VR (3DoF VR).
In free-viewing-location VR, the VR content and VR apparatus presenting said VR content may enable a user to be free to explore the virtual reality space. Thus, the user may be provided with a free point of view or viewing location in the virtual reality space. Free-viewing-location VR is also known as six degrees of freedom (6DoF) VR or volumetric VR to those skilled in the art. Thus, in 6DoF VR the user may be free to look in different directions around the VR space by modification of their viewing direction and also free to change their viewing location (their virtual location) in the VR space by translation along any one of orthogonal x, y and z axes. The movement available in a 6DoF virtual reality space may be divided into two categories: rotational and translational movement (with three degrees of freedom each). Rotational movement enables a user to turn their head to change their viewing direction. The three rotational movements are around x-axis (roll), around y-axis (pitch), and around z-axis (yaw). Translational movement means that the user may also change their point of view in the space to view the VR space from a different virtual location, i.e., move along the x, y, and z axes according to their wishes. The translational movements may be referred to as surge (x), sway (y), and heave (z) using the terms derived from ship motions.
Mixed reality comprises a combination of augmented and virtual reality in which a three-dimensional model of the real-world environment is used to enable virtual objects to appear to interact with real-world objects in terms of one or more of their movement and appearance.
One or more examples described herein relate to 6DoF virtual reality content in which the user is at least substantially free to move in the virtual space either by user-input through physically moving or, for example, via a dedicated user interface (Ul).
Spatial audio comprises audio presented in such a way to a user that it is perceived to originate from a particular location, as if the source of the audio was located at that particular location. Thus, virtual reality content may be provided with spatial audio having directional properties, such that the audio is perceived to originate from a point in the VR space, which may be linked to the imagery of the VR content. Augmented reality may be provided with spatial audio, such that the spatial audio is perceived as originating from real world objects visible to the user and/or from augmented reality graphics overlaid over the user's view.
Spatial audio may be presented independently of visual virtual reality or visual augmented reality content. Nevertheless, spatial audio, in some examples, may be considered to be augmented reality content because it augments the aural scene perceived by a user. As an example of independent presentation of spatial audio, a user may wear headphones and, as they explore the real world, they may be presented with spatial audio such that the audio appears to originate at particular locations associated with real world objects or locations. For example, a city tour could be provided by a device that tracks the location of the user in the city and presents audio describing points of interest as spatial audio such that the audio appears to originate from the point of interest around the user's location.
The spatial positioning of the spatial audio may be provided by 3D audio effects, such as those that utilise a head related transfer function to create a spatial audio space in which audio can be positioned for presentation to a user. Spatial audio may be presented by headphones by using head-related-transfer-function (HRTF) filtering techniques or, for loudspeakers, by using vector-base-amplitude panning techniques to position the perceived aural origin of the audio content. In other embodiments ambisonic audio presentation may be used to present spatial audio. Spatial audio may use one or more of volume differences, timing differences and pitch differences between audible presentation to each of a user's ears to create the perception that the origin of the audio is at a particular location in space.
In some examples, an audio track, which comprises audio content for presentation to a user, may be provided for presentation as spatial audio. Accordingly, the audio track may be associated with a particular location which defines where the user should perceive the audio of the audio track as originating. The particular location may be defined relative to a virtual space or a real-world space. The virtual space may comprise a three-dimensional environment that at least partially surrounds the user and may be explorable by the user. The virtual space may be explorable in terms of the user being able to move about the virtual space by at least translational movement based on user input. If the spatial audio is provided with virtual reality content, virtual reality imagery may be displayed in the virtual space along with spatial audio to create a virtual reality experience. If the spatial audio is provided with visual augmented reality content or independently of augmented or virtual reality content, the particular location may be defined relative to a location in the real world, such as in a real-world room or city. In one or more examples, the virtual space may be configured to correspond to the real world space in which the user is located. Accordingly, the virtual space may be used to determine the interaction between real world objects and locations and virtual objects and locations.
In one or more examples, the audio track and location information may be considered to define an audio object in the virtual space comprising a source location for the audio of an associated audio track. The audio objects may be moveable or non-movable. The audio objects may or may not have a corresponding visual object.
Figure 1 shows an example system 100 for presentation of spatial audio content to a user. The system 100 includes an example apparatus 101 for controlling the presentation of audio tracks of the spatial audio content based on the user's location relative to the audio objects and movement of one or both of the user and the audio object relative to the other. The apparatus 101 may comprise or be connected to a processor 101A and a memory 101B and may be configured to execute computer program code. The apparatus 101 may have only one processor 101 A and one memory 101B but it will be appreciated that other embodiments may utilise more than one processor and/or more than one memory (e.g. same or different processor/memory types). Further, the apparatus 101 may be an Application Specific Integrated Circuit (ASIC).
The processor may be a general purpose processor dedicated to executing/processing information received from other components, such as from a location tracker 102 and a content store 103, in accordance with instructions stored in the form of computer program code in the memory. The output signalling generated by such operations of the processor is provided onwards to further components, such as to audio presentation equipment, such as headphones 108.
The memory 101B (not necessarily a single memory unit) is a computer readable medium (solid state memory in this example, but may be other types of memory such as a hard drive, ROM, RAM, Flash or the like) that stores computer program code. This computer program code stores instructions that are executable by the processor, when the program code is run on the processor. The internal connections between the memory and the processor can be understood to, in one or more example embodiments, provide an active coupling between the processor and the memory to allow the processor to access the computer program code stored on the memory.
In this example the respective processors and memories are electrically connected to one another internally to allow for electrical communication between the respective components. In this example the components are all located proximate to one another so as to be formed together as an ASIC, in other words, so as to be integrated together as a single chip/circuit that can be installed into an electronic device. In some examples one or more or all of the components may be located separately from one another.
The apparatus 101, in this example, forms part of a virtual reality apparatus 104 for presenting visual imagery in virtual reality. In one or more other examples, the apparatus 101 may form part of an AR apparatus. In one or more examples, the apparatus 100 may be independent of an AR or VR apparatus and may provide signalling to audio presentation equipment 108 (such as speakers, which may be incorporated in headphones) for presenting the audio to the user. In this example, the processor 101A and memory 101B is shared by the VR apparatus 104 and the apparatus 101, but in other examples, they may have their own processors and/or memory.
The VR apparatus 104 may provide for display of virtual reality content comprising visual imagery displayed in a virtual space that is viewable by a user using the VR headset 107. In one or more examples in which the apparatus 100 is independent of an AR or VR apparatus, the VR headset 107 may not be required and instead only the audio presentation equipment 108 may be provided.
The apparatus 101 or the VR apparatus 104 under the control of the apparatus 101 may provide for aural presentation of the audio tracks to the user using the headphones 108. The apparatus 101 may be configured to process the audio such that, at any one time, it is presented as one of spatial, monophonic and stereophonic audio or, alternatively or in addition, the apparatus 101 may provide signalling to control the processing and/or presentation of the audio tracks. Accordingly, an audio processor (not shown) may perform the audio processing in order to present the audio in the ways mentioned above under the control of the apparatus 101.
The apparatus 101 may receive signalling indicative of the location of the user from a location tracker 102. The location tracker 102 may determine the user's head orientation and/or the user's location in the real world so that the spatial audio may be presented to take account of head rotation and movement so that the audio is perceived to originate from a direction relative to the user irrespective of user's head movement. If the spatial audio is provided in a virtual reality environment, the location tracker 102 may provide signalling indicative of user movement so that corresponding changes in the user's virtual location in the virtual space can be made.
In the examples that follow, spatial audio content comprising one or more audio tracks, which may be provided from content store 103, may be processed such that they are presented to the user as spatial audio or stereophonic or monophonic audio. Accordingly, in a first instance, the audio track may be presented as spatial audio and as such may undergo audio processing such that it is perceived to originate from a particular location. In a second instance, the same audio track may be presented as monophonic audio and as such may undergo audio processing (if required) such that the audio is presented monophonically to one or both of a left and right speaker associated with the left and right ears of the user. In a third instance, the same audio track may be presented as stereophonic audio (if required) and as such may undergo audio processing such that the audio of the audio track is presented to one or both of a left and right speaker associated with the left and right ear of the user respectively (or even in between the two ears). Monophonic audio, when presented to two speakers provides the same audio to both ears. Stereophonic audio may define two (left and right) or three (left, right, centre) stereo audio channels and the audio of the audio track may be presented to one or more of those channels. In some examples, the difference between stereophonic presentation and spatial audio presentation may be, for spatial audio, the use of a time delay between corresponding audio being presented to speakers associated with a respective left and right ear of the user and, for stereophonic presentation, the non-use of said time delay. It will be appreciated that the presentation of spatial audio may additionally use other presentation effects in addition to differences in the time that corresponding portions of the audio is presented to the user's ears to create the perception of a direction or location from which the audio is heard, such as volume differences amongst others.
While the same audio track may undergo audio processing in order to provide for its presentation as spatial audio or stereophonic or monophonic audio, as described above, in one or more other examples, the audio tracks may be pre-processed and may thus include different versions for presentation as spatial audio or stereophonic or monophonic audio. In one or more examples, the presentation of an audio track as spatial audio may decrease its fidelity and thus, presentation as monophonic or stereophonic audio may provide for an increase in audio quality.
Figure 2 shows an example virtual space 200. The virtual space may comprise a three-dimensional virtual environment in which the location of a user, comprising user-location 201, is defined as well as the location of one or more audio objects 202-206. In this example, the spatial audio content that defines the audio objects is part of virtual reality content, which includes visual content to accompany the spatial audio content, although in other examples the audio objects may not have corresponding visual objects. In this example, the location of the user is shown diagrammatically by an image of a person but it will be appreciated that in one or more examples, the user location 201 designates the location in the virtual space at which the user perceives the audio presented to them, i.e. a "point-of-hearing location" similar to a "point-of-view location". The first through fifth audio objects 202-206 are illustrated diagrammatically by their corresponding visual representations. Thus, the first audio object 202 represents the audio from a first drummer who appears in the visual content. The second audio object 203 represents the audio from a second drummer who appears in the visual content. The third audio object 204 represents the audio from a guitarist who appears in the visual content. The fourth audio object 205 represents the audio from a ballerina who appears in the visual content. The fifth audio object 206 represents the audio from a singer who appears in the visual content. In this example, at least the ballerina may move about the virtual space 200 and accordingly the visual imagery of the ballerina and the audio object may, correspondingly, move with elapsed time through the virtual reality content.
Each of the audio objects 202-206 may be associated with an audio track which may be presented as spatial audio to the user. Accordingly, the first audio object 202 defines the location of the perceived origin of an associated first audio track as perceived by the user from their user location 201. Likewise, the audio of the audio tracks associated with the second to fifth audio objects is presented to the user such that the user perceives the origin of the second to fifth audio tracks as originating at the location of the respective audio objects relative to the user location 201.
As will be appreciated when the audio is presented as spatial audio, when the user changes their location in the virtual space 200, there is a corresponding change in the presentation of the audio track as spatial audio. For example, the volume of the audio track presented to the user may be a function of the distance of the user location 201 from the corresponding audio object location. Thus, in one or more examples, as the user moves towards the audio object location the spatial audio presented audio track is presented louder and as the user moves away the audio track is presented more quietly. Also, as the user moves their head, the direction (relative to the user's head) from which the spatial audio is perceived to originate changes in accordance with the direction to the audio object location relative to the user's direction of view.
In this examples, the virtual reality content is free-viewing-location VR or "6DoF" and therefore the apparatus 101 or VR apparatus 104 may be configured to visually and aurally present the visual imagery and spatial audio content in accordance with user-input to move their user location in the virtual space 200 as well as their viewing direction.
While in these examples the spatial audio content may be associated with virtual reality content, in one or more examples, the spatial audio content comprises augmented reality content and the virtual space 200 corresponding to a real-world space in which the user is located such that a location of the user in the real-world space corresponds to the user location in the virtual space and the audio-object location in the virtual space corresponds to a real-world-audio-object location in the real-world space. Thus, in general, the apparatus 101 may be configured to provide control of the presentation of spatial audio in a virtual or augmented reality environment.
In one or more examples, the user may wish to control the way in which the spatial audio is presented. Accordingly, the user may wish to control audio parameters of the audio tracks associated with the audio objects 202-206, such as in terms of their relative volume, frequency dependent gain applied or other audio presentation parameter, for example. The term "audio mixing" may be used to refer to the control of one or more audio parameters associated with the audible presentation of the audio track. The change in audio parameters may then be applied when the spatial audio content is later consumed by the same or a different user or may be applied to live content that is then provided for sending to multiple consumers of VR content.
In one or more examples, the user 201 may comprise a content producer and the apparatus 101 may provide a spatial audio mixing apparatus for spatial audio content production. In one or more examples, the user 201 may comprise a consumer of spatial audio content, or VR/AR content that includes spatial audio content, and the apparatus 101 may comprise at least part of a spatial audio presentation apparatus.
In one or more examples, the user may wish to select one of the audio objects 202-206 on which to perform audio mixing by moving about the virtual space such that the user-location is within a threshold distance of the audio object. With the audio object selected in this way the user may be provided with an audio mixing interface to provide for control of one or more audio parameters of the audio track associated with the selected audio object 202-206. However, as mentioned above, if the user can move about the virtual space 200 and, alternatively or in addition, the audio object 205 may move about the virtual space 200, maintaining selection of the audio object with such relative movement may be problematic.
In one or more examples, if the user wishes to maintain selection of a moving audio object, the user may have to inconveniently change their location to keep track of the audio object. If the control of user location in the virtual space 200 is controlled by tracking physical user movement, this may be physically tiring. In other examples, not comprising a function of the apparatus 101, there may be a simplistic latching or locking onto the audio object. However, in such a situation, the user's point of view in the virtual space is limited to that controlled by the movements of the object of interest, which may be confusing.
The example apparatus 101, as explained below, provides a way of maintaining selection of an audio object irrespective of at least some relative movement between the user and the audio object. With said selection, the audio parameters that affect how the audio of the audio track associated with the selected audio object is presented may be more readily controlled.
The example figure 3 shows the same virtual space 200 as figure 2 alongside a plan view 300 of the virtual space 200. The audio objects 202-206 and the user location 201 are also shown.
In this example, the audio objects are shown having a bubble 302-306 surrounding them. The bubbles may, in one or more examples, represent a predetermined-bubble-distance 307 that extends around each audio object 202-206. The predetermined-bubble-distance 307 may comprise a predetermined threshold distance which may be used as a means for selecting the audio objects when the user 201 location is within said predetermined-bubble-distance 307 of a particular audio object 202-206. In one or more embodiments, the predetermined-bubble-distance may define a spherical volume around the audio objects to be used for selection thereof. In other embodiments, the predetermined-bubble-distance may be different depending on the angle of approach towards said audio object. In one or more examples, the predetermined bubble-distance may extend from a centre of the audio object or a visual object associated with the audio object to define the bubble. In other examples, the predetermined bubble-distance may extend from a surface or a bounding geometrical volume (e.g. bounding cuboid or bounding ellipsoid) of the audio object or a visual object associated with the audio object. Accordingly, the bubble 302-306 may be non-spherical and may generally follow the shape and extent of the visual object or the shape and extent of the audio object (if the audio object is not a point source for the audio). The bubbles 302-306 may or may not be visible. In one or more examples, only a subset of the audio objects may be selectable and thus only a subset may have an associated bubble. Whether or not an audio object is selectable or not may be defined in information associated with the spatial audio content.
The distance between the user 201 location and the fourth audio-object 205 location is shown by arrow 308 and it is greater than the predetermined-bubble-distance 307.
The example of figure 4 happens to show the selection of the fourth audio object 205 by virtue of the user moving to a new user 201 location in the virtual space shown by arrow 400. It will be understood that the user 201 could have approached any other of the audio objects. The old user location 401 is shown for understanding. In such a location, the distance between the user 201 location and the fourth audio-object 205 location, shown by arrow 308, has decreased from the distance shown in figure 3 to less than the predetermined-bubble-distance 307. Accordingly, the apparatus 101 may, based on this condition being met, provide for selection of the fourth audio object 205. In this configuration, the user may be considered to be within the bubble 305.
Thus, the apparatus 101, based on spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from an fourth audio-object 205 location in the virtual space 200, the user having a user 201 location in the virtual space 200 from which to perceive the spatial audio and based on the distance 308 between the user 201 location and the fourth audio-object 205 location having decreased to less than the predetermined-bubble-distance 307 has provided for selection of the audio object. Selection of the fourth audio object 205 is configured to provide for a change in the audible presentation of the fourth audio track associated with the fourth audio object to the user from presentation as spatial audio to presentation as one of monophonic and stereophonic audio.
In one or more examples, the apparatus 101 may require additional conditions to be met before selecting one of the audio objects. For example, the user may be required to approach the fourth audio object 205 below the predetermined-bubble-distance 307 and provide a user input before the apparatus provides for selection of the fourth audio object 205.
The example of figure 5 shows the change in audible presentation of the audio tracks associated with the first to fifth audio objects 202-206. In the plan view of figure 5 the first, second, third and fifth audio objects are shown with arrows 502, 503, 504, 506 to indicate the direction from which the user perceives the spatial audio of the audio track associated with those audio objects 202, 203, 204, 206. Accordingly, the audio of the audio tracks of the first, second, third and fifth audio objects is heard externalized, that is from a particular direction or location in the virtual space. The audio track 507 of the selected fourth audio object 205 is shown positioned within the user's head and is shown as a circle in figure 5. This signifies that the audio of the fourth audio track is rendered as non-externalized in-head audio, which comprises one of monophonic audio or stereophonic audio. The fourth audio object 205 is not shown in the plan view 300 as it would be overlapping with the user's 201 head of the plan view designating the user location.
The selection of the fourth audio object 205 may provide for audio mixing of audio track associated with that audio object. Figure 5 shows an example audio mixing interface 500 that may be visually presented for audio mixing. The apparatus 101 may be configured to receive user-input via said audio mixing interface for modification of audio parameters associated with the selected fourth audio track. In other examples, no visual interface is provided and instead other user input, such as predetermined gestures or voice commands may provide for control of one or more audio parameters.
In one or more examples, the apparatus 101 may be caused to provide the audio mixing user interface 500 for modification of one or more audio parameters of only the selected audio track. The one or more audio parameters may be selected from at least one or more of: volume, bass level, mid-tone level, treble-level, reverberation level and echo among others. The audio mixing user interface 500 may be visually presented and include sliders or other control interfaces for each audio parameter. In other examples, the audio mixing user interface may not be visualized and predetermined user gestures may control predetermined audio parameters. For example, rotation of the user's left hand may control volume level while rotation of the user's right hand may control reverberation. It will be appreciated that other gesture/audio parameter combinations are possible.
Example figures 6 and 7 illustrate relative movement between the user 201 location and the location of the fourth audio object 205 in the virtual space 200 causing the distance 308 between the user 201 location and the location of the fourth audio object 205 to increase. Relative movement of the audio objects and/or the user may make maintaining selection of one of the audio objects difficult.
Figure 6 shows the fourth audio object 205 moving away from the user 201 location. The apparatus 101 is configured to maintain the presentation of the audio of the fourth audio track 507 as monophonic or stereophonic audio, which may be advantageous for audio mixing, irrespective of movement of the fourth audio object 205 away from the user 201 location in the virtual space causing the distance 308 between the user location and the fourth audio-object 205 location to increase beyond the predetermined-bubble-distance 307.
As shown in figure 6, this may be compared to the bubble 305 stretching beyond its original size, the original size comprising the size and/or shape prior to the user "entering the bubble". Accordingly, in one or more examples, the user may continue to benefit from higher audio quality provided by the monophonic/stereophonic presentation of the fourth audio track and thus avoid distractions caused by changes in spatial audio rendering due to relative movement of the audio objects and user location.
The apparatus 101 may be configured to provide for audible presentation of the audio tracks associated with the other, unselected audio objects 202, 203, 204, 206 as spatial audio and therefore they are perceived as originating from their respective locations in the virtual space.
It will be appreciated that in one or more examples, ambience audio content may be provided, in addition, and the apparatus may provide for presentation of the ambience audio content.
The ambience audio content may comprise audio that does not have location information such that it may not be presented as originating from a particular point in space and is presented as ambience.
Figure 7 shows the user moving in the virtual space such that the user 201 location is moving away from the fourth audio object 205. The apparatus 101 is configured to maintain selection and thus presentation of the audio of the fourth audio track 507 as monophonic or stereophonic audio, which may be advantageous for audio mixing, irrespective of movement of the user 201 location away from the fourth audio object 205 in the virtual space causing the distance 308 between the user location and the fourth audio-object 205 location to increase beyond the predetermined-bubble-distance 307.
As shown in figure 7 and similar to figure 6, this may be compared to the bubble 305 stretching beyond its original size. Accordingly, in one or more examples, the user may continue to benefit from higher audio quality provided by the monophonic/stereophonic presentation of the fourth audio track and thus avoid distractions caused by changes in spatial audio rendering due to relative movement of the audio objects and user location.
As described above in relation to figure 6, the apparatus 101 may be configured to provide for audible presentation of the audio tracks associated with the other, unselected audio objects 202, 203, 204, 206 as spatial audio and therefore they are presented such that they are perceived as originating from their respective locations in the virtual space relative to the user 201 location that the user has moved to. As in the previous example, the apparatus may additionally provide for presentation of the ambience audio content.
The provision of selection of an audio object based on a distance between the user and the audio object decreasing to within the predetermined-bubble-distance, i.e. the user moving near to or bumping into the audio object, and maintaining selection even if subsequent movement of the user and/or audio object increases said distance above the predetermined-bubble-distance may have technical advantages. The maintenance of selection may allow for less exertion by the user to "track" the selected, moving, audio object. Further, selection of the audio objects in this way may allow for presentation of a more stable audio scene containing other audio objects because the user does not have to move to maintain selection and thus the direction from which the user perceives other spatial audio in the scene does not have to be modified based on the moving user location. As described in these examples, the selection of the audio object provide for presentation of the audio of the audio object from its default presentation as spatial audio to one of stereophonic/monophonic audio, which may be technically advantageous for audio mixing. Further, given that the selected audio object is not presented as spatial audio, movement of the user and/or audio object in the virtual space does not lead to an audible distraction or modification which would occur with presentation as spatial audio.
Example figures 8 and 9 show the bubble 305 stretching and breaking. The breaking of the bubble 305 is symbolic of the apparatus 101 being configured to provide for a change in the audible presentation of the fourth audio track 507 to the user from presentation as at least one of monophonic and stereophonic audio to presentation as spatial audio.
When the bubble 305 breaks may be determined based on a further threshold distance between the user 201 location and the selected audio object 205. The threshold distance may be termed a predetermined-stretched-bubble-distance which is greater than the predetermined-bubble-distance.
Figure 8 shows the relative movement between the user 201 location and the current location of the fourth audio object 205 in the virtual space 200 causing the distance 308 between the user location and the first audio-object location to approach the predetermined-stretched-bubble-distance 800. In one or more examples an audio effect may be applied to the monophonically or stereophonically presented fourth audio track to audible indicate that the predetermined-stretched-bubble-distance 800 is almost reached. Thus, within a threshold below the predetermined-stretched-bubble-distance 800 one or more of an audible, visual and haptic feedback may be provided to the user. The visual feedback may comprise a message or graphic. The audio feedback, may comprise a spoken message or an audio effect which may comprise an echo effect or reverberation effect or an underwater effect or a distinguishable audio tone.
Figure 9 shows the relative movement between the user 201 location and the location of the fourth audio-object 205 having caused the distance 308 between the user location and the first audio-object location to have increased beyond the predetermined-stretched-bubble-distance 800. Accordingly, with reference to the plan view 900, the apparatus 101 may provide for a change in the audible presentation of the fourth audio track to the user from presentation as at least one of monophonic and stereophonic audio to presentation as spatial audio such that the audio of the fourth audio track is perceived from a direction 901 based on the current location of the fourth audio object 205 relative to the current user location 201. The other audio objects 202, 203, 204, 206 may continue to be presented as spatial audio.
Accordingly, relative movement between the audio object location and the user 201 location while the distance remains below the predetermined-stretched-bubble-distance may not affect the audible presentation of the fourth audio track because it is presented as monophonic or stereophonic audio. However, once the bubble breaks and the apparatus provides for presentation of the fourth audio track as spatial audio, then relative movement between the audio object location and the user location 201 will affect the audible presentation because spatial audio is rendered based on the location of the audio object relative to the user 201 location.
In the above example, the predetermined-stretched-bubble-distance was used by the apparatus 101 to determine when to deselect and thus switch back to spatial audio presentation of an audio track. In one or more examples, the apparatus 101 may additionally or alternatively provide said change in audible presentation based on a user request. Thus, with the fourth audio track selected and audibly presented as one of monophonic and stereophonic audio and based on user input indicative of a desire to return to spatial audio presentation of the fourth audio track, the apparatus 101 may provide for a change in the audible presentation of the fourth audio track to the user from presentation as at least one of monophonic and stereophonic audio to presentation as spatial audio.
It will be appreciated that the relative locations of the user and/or the audio object may have changed since the audio object was selected and thus, when changing back to presentation as spatial audio the audio of the fourth audio track may be perceived from a direction based on the current audio-object location relative to the current user 201 location at the time of said user input or the time of the predetermined-stretched-bubble-distance being exceeded.
The apparatus 101 may provide for changes requested by way of the audio mixing user interface 500 to be audibly presented in real time to the user. The changes may be automatically saved or may require a user input to be saved for future presentation of the spatial audio content or for application to live spatial audio content for onward transmission.
In one or more examples, based on receipt of user input to the audio mixing user interface 500 causing changes to one or more audio parameters of the audio track and, subsequently, the bubble 305 is broken due to the distance between the user location and the audio-object location increasing beyond the predetermined-stretched-bubble-distance or user input indicative of a desire to return to spatial audio presentation of the fourth audio track, the apparatus may be configured to provide for discarding of the changes to one or more audio parameters. Thus, to save the changes the user must provide a save command prior to said bubble 305 breaking. In other examples, on breaking of the bubble 305 the current changes to the audio parameters are saved.
Although the examples above illustrate the operation of the apparatus 101 in relation to selection of the fourth audio object, any of the audio objects may be selected.
With reference to figures 8 and 9, on breaking of the bubble 305, the audible presentation of the fourth audio track may abruptly transition from monophonic/stereophonic presentation to spatial audio presentation from a particular location in the virtual space and thus a particular direction. This may be confusing.
In one or more examples, the presentation of audio monophonically or stereophonically may be considered to locate the source of the audio within a user's head. Accordingly, on transition to presentation as spatial audio, the apparatus 101 may provide for rendering of the spatial audio of the now deselected fourth audio object 205 from at least one or more intermediate positions between the user 201 location and the current location of the fourth audio object (as shown in figure 9) over a transition period. This may be perceived as the audio of the fourth audio track gradually moving away from its in-head presentation (monophonically/stereophonically) to its true, current position in the virtual space 200. Thus, between the audible presentation of the fourth audio track to the user as at least one of monophonic and stereophonic audio and the presentation of the fourth audio track as spatial audio, the apparatus 101 may be caused to provide for audible presentation of the fourth audio track with a transitional effect comprising the perceived origin of the audio of fourth audio track progressively moving away from the user 201 location to the current fourth audio-object 205 location over a transition time. The transition time may comprise less than a second or less than one, two, three, four or five seconds or, in general, the transition time may range from almost instantaneous to multiple seconds.
When presenting spatial audio, it may be advantageous to apply effects to the audio to replicate how the audio would be heard in a particular room. In one or more examples, the audio tracks may comprise audio captured from close up microphones. Audio captured of an audio source by close up microphones may sound different to audio heard by a user in a room with the audio source because the user would hear the same audio but it would typically include component echos and reverberations caused by the sound waves from the audio source interacting with the surfaces in the room. Thus, in one or more examples, an audio effect termed a Room Impulse Response may be applied to the audio tracks which may make them sound as if heard in a particular room. The Room Impulse Response may comprise an audio processing function that simulates the effect of the surfaces of a particular room. The Room Impulse Response may also comprise a function of the user's location in the particular room relative to the audio object. The particular room itself may be based on the virtual reality content presented to the user. Thus, if the visual content of the VR content shows a hard walled, and therefore echo-prone, room, the apparatus may apply a Room Impulse Response function to replicate such an audio environment.
With application to the present apparatus 101, when presenting one of a plurality of audio tracks of the spatial audio content monophonically or stereophonically with the other audio tracks presented as spatial audio, the apparatus may be configured to apply a room impulse response function to at least one of said other audio tracks, the room impulse response function configured to modify the audio of the at least one other audio tracks to sound as if it is heard in a predetermined room with a particular location in said predetermined room.
With reference to figure 10, the particular location in said predetermined room may, in a first example, be determined based on the user 201 location and, in a second example, be determined based on the current location of the selected audio object. Figure 10 shows the location of the room impulse response function being based on the user location at 1001. Figure 10 shows the location of the room impulse response function being based on the location of the selected audio object at 1002. Accordingly, the different room impulse response functions may be applied to the audio tracks of the first, second, third and fifth audio objects 202, 203, 204, 206.
Accordingly, the apparatus 101 may provide for presentation of the other audio in the virtual space with a room impulse response function from either the user's 201 listening position or the position corresponding to the fourth audio object 205.
In one or more examples, the apparatus 101 may provide for user-selection of the Room Impulse Response function in terms of being based on the user location or based on the current location of the selected, fourth audio object.
In one or more examples, the Room Impulse Response function may be applied to the selected audio track to simulate the audio being heard within a room defined by the bubble 305.
In one or more examples, the Room Impulse Response function may be continually or periodically updated based on the current position in the room of the user 201 and/or the one or more audio objects.
Figure 11 shows a flow diagram illustrating the steps of,

based on 1101 spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a first audio-object location in a virtual space, the user having a user location in the virtual space from which to perceive the spatial audio and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance;
providing for 1102 a change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track, and wherein said presentation of said first audio track as at least one of monophonic and stereophonic audio is maintained irrespective of relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.

In one or more examples, the spatial audio content may include a second audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a second audio-object location in the virtual space. At least when the distance between the user location and the second audio-object location is greater than the predetermined-bubble-distance, the method may provide for audible presentation of the second audio track to the user as spatial audio based on the relative location of the user location and the second audio-object location while providing the above-mentioned change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track.
Figure 12 illustrates schematically a computer/processor readable medium 1200 providing a program according to an example. In this example, the computer/processor readable medium is a disc such as a digital versatile disc (DVD) or a compact disc (CD). In some examples, the computer readable medium may be any medium that has been programmed in such a way as to carry out an inventive function. The computer program code may be distributed between the multiple memories of the same type, or multiple memories of a different type, such as ROM, RAM, flash, hard disk, solid state, etc.
User inputs may be gestures which comprise one or more of a tap, a swipe, a slide, a press, a hold, a rotate gesture, a static hover gesture proximal to the user interface of the device, a moving hover gesture proximal to the device, bending at least part of the device, squeezing at least part of the device, a multi-finger gesture, tilting the device, or flipping a control device. Further the gestures may be any free space user gesture using the user's body, such as their arms, or a stylus or other element suitable for performing free space user gestures.
The apparatus shown in the above examples may be a portable electronic device, a laptop computer, a mobile phone, a Smartphone, a tablet computer, a personal digital assistant, a digital camera, a smartwatch, smart eyewear, a pen based computer, a non-portable electronic device, a desktop computer, a monitor, a smart TV, a server, a wearable apparatus, a virtual reality apparatus, or a module/circuitry for one or more of the same.
Any mentioned apparatus and/or other features of particular mentioned apparatus may be provided by apparatus arranged such that they become configured to carry out the desired operations only when enabled, e.g. switched on, or the like. In such cases, they may not necessarily have the appropriate software loaded into the active memory in the non-enabled (e.g. switched off state) and only load the appropriate software in the enabled (e.g. on state). The apparatus may comprise hardware circuitry and/orfirmware. The apparatus may comprise software loaded onto memory. Such software/computer programs may be recorded on the same memory/processor/functional units and/or on one or more memories/processors/ functional units.
In some examples, a particular mentioned apparatus may be pre-programmed with the appropriate software to carry out desired operations, and wherein the appropriate software can be enabled for use by a user downloading a "key", for example, to unlock/enable the software and its associated functionality. Advantages associated with such examples can include a reduced requirement to download data when further functionality is required for a device, and this can be useful in examples where a device is perceived to have sufficient capacity to store such pre-programmed software for functionality that may not be enabled by a user.
Any mentioned apparatus/circuitry/elements/processor may have other functions in addition to the mentioned functions, and that these functions may be performed by the same apparatus/circuitry/elements/processor. One or more disclosed aspects may encompass the electronic distribution of associated computer programs and computer programs (which may be source/transport encoded) recorded on an appropriate carrier (e.g. memory, signal).
Any "computer" described herein can comprise a collection of one or more individual processors/processing elements that may or may not be located on the same circuit board, or the same region/position of a circuit board or even the same device. In some examples one or more of any mentioned processors may be distributed over a plurality of devices. The same or different processor/processing elements may perform one or more functions described herein.
The term "signalling" may refer to one or more signals transmitted as a series of transmitted and/or received electrical/optical signals. The series of signals may comprise one, two, three, four or even more individual signal components or distinct signals to make up said signalling. Some or all of these individual signals may be transmitted/received by wireless or wired communication simultaneously, in sequence, and/or such that they temporally overlap one another.
With reference to any discussion of any mentioned computer and/or processor and memory (e.g. including ROM, CD-ROM etc), these may comprise a computer processor, Application Specific Integrated Circuit (ASIC), field-programmable gate array (FPGA), and/or other hardware components that have been programmed in such a way to carry out the inventive function.
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole, in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that the disclosed aspects/examples may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the disclosure.
While there have been shown and described and pointed out fundamental novel features as applied to examples thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices and methods described may be made by those skilled in the art without departing from the scope of the disclosure. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the disclosure. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or examples may be incorporated in any other disclosed or described or suggested form or example as a general matter of design choice. Furthermore, in the claims means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures. Thus although a nail and a screw may not be structural equivalents in that a nail employs a cylindrical surface to secure wooden parts together, whereas a screw employs a helical surface, in the environment of fastening wooden parts, a nail and a screw may be equivalent structures.

Claims

An apparatus comprising:
at least one processor; and

at least one memory including computer program code,

the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:
based on spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a first audio-object location in a virtual space relative to a user location of the user in the virtual space, and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance;

provide for a change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track, and wherein said presentation of said first audio track as at least one of monophonic and stereophonic audio is maintained irrespective of relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.
The apparatus of claim 1, wherein based on the relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond a predetermined-stretched-bubble-distance which is greater than the predetermined-bubble-distance;
provide for a change in the audible presentation of the first audio track to the user from presentation as at least one of monophonic and stereophonic audio to presentation as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location.
The apparatus of claim 1 or claim 2, wherein one or both of:
i) user initiated movement of their user location in the virtual space; and

ii) movement of the audio-object in the virtual space;
provides for the relative movement.
The apparatus of any preceding claim, wherein the spatial audio content comprises a plurality of audio tracks including at least the first audio track and a second audio track, the second audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a second audio-object location in the virtual space; and
based on the presentation of said first audio track as at least one of monophonic and stereophonic audio, provide for audible presentation of the second audio track as spatial audio such that the audio of the second audio track is perceived from a direction based on the second audio-object location relative to the user location.
The apparatus of any preceding claim, wherein, with the first audio track audibly presented as one of monophonic and stereophonic audio, based on user input indicative of a desire to return to spatial audio presentation of the first audio track, provide for a change in the audible presentation of the first audio track to the user from presentation as at least one of monophonic and stereophonic audio to presentation as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location.
The apparatus of any preceding claim wherein in addition to the apparatus being caused to provide for a change in the audible presentation of the first audio track, the apparatus is caused to provide an audio mixing user interface for modification of one or more audio parameters of the first audio track.
The apparatus of claim 6, wherein the one or more audio parameters comprise at least one or more of: volume, bass level, mid-tone level, treble-level, reverberation level and echo.
The apparatus of claim 6 or 7, wherein based on receipt of user input to the audio mixing user interface causing changes to one or more audio parameters of the first audio track and, subsequently, at least one of:
i) relative movement between the user location and the audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond a predetermined-stretched-bubble-distance which is greater than the predetermined-bubble-distance;

ii) user input indicative of a desire to return to spatial audio presentation of the first audio track;
provide for one of:
c) discarding of the changes to one or more audio parameters of the first audio track unless a user initiated save input is received; and

d) a change in the audible presentation of the first audio track to presentation as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location with the changes to the one or more audio parameters applied.
The apparatus of claim 2 or claim 5, wherein between the audible presentation of the first audio track to the user as at least one of monophonic and stereophonic audio and the presentation of the first audio track as spatial audio such that the audio of the first audio track is perceived from a direction based on the first audio-object location relative to the user location, the apparatus is caused to provide for audible presentation of the first audio track with a transitional spatial audio effect comprising the perceived origin of the audio of first audio track progressively moving away from the user location to the current first audio-object location.
The apparatus of claim 2, wherein based on the relative movement between the user location and the audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase to within a threshold of the predetermined-stretched-bubble-distance, provide for audible presentation of the first audio track to the user as at least one of monophonic and stereophonic audio with an audio effect to thereby audibly indicate that the user is approaching the predetermined-stretched-bubble-distance.
The apparatus of any preceding claim wherein when presenting one of a plurality of audio tracks of the spatial audio content monophonically or stereophonically with the other audio tracks presented as spatial audio, the apparatus is caused to apply a room impulse response function to at least one of said other audio tracks, the room impulse response function configured to modify the audio of the at least one other audio tracks to sound as if it is heard in a predetermined room with a particular location in said predetermined room, the particular location in said predetermined room based on either:
i) the user location; and

ii) a current location of the first audio object.
The apparatus of claim 11, wherein the apparatus is caused to provide for user-selection of one of:
a) the Room Impulse Response function based on the user location; and

b) the Room Impulse Response function based on the current location of the first audio object.
The apparatus of claim 1, wherein the spatial audio content is audibly presented as spatial audio by processing the first audio track using one or more of:
i) a head-related-transfer-function filtering technique;

ii) a vector-base-amplitude panning technique; and

iii) binaural audio presentation.
A method, the method comprising
based on spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a first audio-object location in a virtual space relative to a user location of the user in the virtual space, and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance;
providing for a change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track, and wherein said presentation of said first audio track as at least one of monophonic and stereophonic audio is maintained irrespective of relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.
A computer readable medium comprising computer program code stored thereon, the computer readable medium and computer program code being configured to, when run on at least one processor, perform the method of:
based on spatial audio content comprising at least one audio track comprising audio for audible presented to a user as spatial audio such that the audio is perceived to originate from a first audio-object location in a virtual space relative to a user location of the user in the virtual space, and based on the distance between the user location and the first audio-object location having decreased to less than a predetermined-bubble-distance;

providing for a change in the audible presentation of the first audio track to the user from presentation as spatial audio to presentation as at least one of monophonic and stereophonic audio for audio mixing of said first audio track, and wherein said presentation of said first audio track as at least one of monophonic and stereophonic audio is maintained irrespective of relative movement between the user location and the first audio-object location in the virtual space causing the distance between the user location and the first audio-object location to increase beyond the predetermined-bubble-distance.