WO2022238476A1

WO2022238476A1 - Method and system for manipulating audio components of a music work

Info

Publication number: WO2022238476A1
Application number: PCT/EP2022/062767
Authority: WO
Inventors: Kurran John KARBAL; Thomas Oliver MARSH
Original assignee: Altered States Technologies Ltd.
Priority date: 2021-05-11
Filing date: 2022-05-11
Publication date: 2022-11-17
Also published as: EP4338154A1; GB2606538A

Abstract

A computer-implemented method for manipulating an audio track, comprising a plurality of initial audio components, being presented at an audio output is provided. The audio track may be manipulated by way of an input via an input mechanism which comprises the selection of a user-selectable region on an interface, where the user-selectable region is associated with a set of audio parameters which are subsequently applied to the plurality of initial audio components. Also provided is a corresponding system and computer readable medium. These enable a user, particularly an average listener with no technical expertise, to manipulate an audio track in an efficient and intuitive manner while that audio track is played back, for instance, within the context of a music playback application.

Description

METHOD AND SYSTEM FOR MANIPULATING AUDIO COMPONENTS OF A

MUSIC WORK

Technical field

[0001] The present disclosure relates to a method for manipulating, via an input mechanism, an audio track being presented at an audio output. More specifically, it relates to a method for changing the mix of an audio track, such as a song, being present at an audio output.

Background

[0002] The state of the art in consuming recorded music through music playback applications (such as a third-party cloud application) is to listen to any of artist-released mixes and remixes of songs which are either pre-selected by the user or suggested sequentially by an algorithm according to metadata associated with, for instance, listening history. Such metadata may include the songs’ audio profiles, genres, styles, tempos, moods and any other measurable characteristics. The prior art algorithms seek to predict and pre-select artist-released mixes and remixes of songs which a user is likely to listen to, thus removing the need for the user to manually select songs and playlists for playback.

[0003] However, if the song suggested by the algorithm has not been accurately predicted (for instance, if it does not meet the user’s current requirements in terms audio profile, genre, style, tempo, mood or any other measurable characteristics), it is likely to be “skipped” by the user. Consequently, the algorithm will note the skipped song and may subsequently supress, not just that song, but also other songs which have similar audio profiles, genres, styles, tempos, moods or any other measurable characteristics to the skipped track. This may lead to the unnecessary suppression of songs which may, in fact, be suitable for suggesting to the user were they to be manipulated, modified or tweaked slightly.

[0004] In that respect, listening to music via state-of-the-art music playback applications is currently a predominantly passive process. The user is only able to listen to discrete artist-released mixes and remixes of songs and, apart from the ability to skip or fast forward, the ordinary user is unable to manipulate these mixes and remixes to personalise their listening experiences and to, for instance, fit the user’s current listening or currently predicted playlists.

[0005] In contrast to an average listener on a music playback application, an artist or producer in a recording studio has control over the sound of the mixes or remixes via a digital audio workstation (DAW) which acts as the artist’s feedback and integration interface with their music. In most cases only the artist’s official mixed release or releases are available to the public - meaning that the listening public are unable to interact with the musical piece in the same way as the artist or producer. Even if the original individual audio layers (e.g. the initial audio components) which make up the musical piece were to be made available to the public, a DAW would be required to playback and analyse the elements together. Furthermore, a significant amount of musical and technical knowledge would be required to modify and refine the elements, putting this process and the resulting customised mix out of reach of the average listener. Furthermore still, modification of a song within a DAW is an offline process. It cannot be carried out on-the-fly, in conjunction with a music playback

1 application. Rather, an audio file must be first exported from a DAW before that audio file can be added to, for instance, a playlist within a music playback application. Additionally, a DAW also relies on inputs from the artist, producer and collaborators, and is unable to predict and suggest changes based upon the listener’s musical preferences.

[0006] Accordingly, a need exists in the state of the art for a method and system which solves the aforementioned deficiencies.

Summary

[0007] According to an aspect, there is provided a computer-implemented method for manipulating, via an input mechanism, an audio track being presented at an audio output, the audio track comprises a plurality of initial audio components configured to be presented simultaneously at the audio output, the input mechanism in communication with a display, the display presents a first plurality of user-selectable regions, each user-selectable region in the first plurality of user-selectable regions are associated with a respective set of audio parameters. The method comprises: receiving a first user input from a user via the input mechanism, wherein the first user input comprises selecting a first user-selectable region in the first plurality of user-selectable regions; applying a first set of audio parameters associated with the first user-selectable region to the plurality of initial audio components to produce a plurality of parameterised audio components, wherein each parameterised audio component corresponds to an initial audio component; and presenting the parameterised audio components at the audio output.

[0008] Optionally, each user-selectable region may be defined by a range of positions on an x- axis and a y-axis of an associated coordinate grid. The first user input may comprise a first input of positional information on the associated coordinate grid. Selecting a first user-selectable region may be based on the input of positional information. Optionally, each user-selectable region may be further defined by a range of positions on a z-axis of the associated coordinate grid. Optionally the display may comprise the associated coordinate grid.

[0009] Optionally, the first user-selectable region may be a sub-region of a first envelope region and a second envelope region. The first envelope region may be defined by a first envelope range of positions on the associated coordinate grid and the second envelope region may be defined by a second envelope range of positions on the associated coordinate grid. The first envelope region and the second envelope region may overlap. Each of the first and second envelope regions may be associated with a respective set of audio parameters. Here, the first set of audio parameters may comprise a ratio of audio parameters, the ratio of audio parameters comprising a term corresponding to the audio parameters associated with the first envelope region and a term corresponding to the audio parameters associated with the second envelope region.

[0010] Optionally, the first user-selectable region may additionally be a sub-region of a third envelope region. Here, the ratio of audio parameters may further comprise a term corresponding to the audio parameters associated with the third envelope region.

[0011] Optionally, the ratio of audio parameters may be a constant across the first user- selectable region. Optionally, the ratio of audio parameters may equal one.

2 [0012] Optionally, the term corresponding to the audio parameters associated with the first envelope region may be inversely correlated with a distance between the positional information and a boundary of the first envelope range of positions on the associated coordinate grid, and the term corresponding to the audio parameters associated with the second envelope region may be inversely correlated with a distance between the positional information and a boundary of the second envelope range of positions on the associated coordinate grid.

[0013] Optionally, the term corresponding to the audio parameters associated with the first envelope region may be proportional to a distance between the positional information and a center of the first envelope range of positions on the associated coordinate grid, and the term corresponding to the audio parameters associated with the second envelope region may be proportional to a distance between the positional information and a center of the second envelope range of positions. [0014] Optionally, the computer-implemented method may further comprise: receiving an updated user input from a user via the input mechanism, wherein the updated user input comprises updated positional information on the associated coordinate grid; replacing, at the audio output, the plurality of parameterised audio components being presented at the audio output with an updated plurality of parameterised audio components, wherein the updated plurality of parameterised audio components is the result of applying a set of audio parameters associated with the updated positional information on the coordinate grid to the plurality of initial audio components.

[0015] Optionally, the input mechanism may be configured to receive a gesture, wherein the gesture comprises a movement on the associated coordinate grid from the first input of positional information to the updated user input. The movement may comprise a movement speed. The plurality of parameterised audio components being presented at the audio output may be replaced with an updated plurality of parameterised audio components at an update speed, the update speed being positively correlated with the movement speed. Optionally the input mechanism comprises at least one of: a touch screen, the gesture comprising a gesture on the touch screen; or a cursor on the display, the gesture comprising the movement of the cursor.

[0016] Optionally, the input mechanism may comprise at least one of a touch screen, a mouse, a trackpad, a clicker, an accelerometer, a button, a stylus, a microphone, a handset controller, a visual sensor and/or a GPS sensor.

[0017] Optionally, applying a first set of audio parameters comprises applying a first set of audio parameters to the plurality of chosen initial audio components.

[0018] Optionally, the computer-implemented method may further comprise: determining that the parameterised audio components being presented at the audio output contain auditory masking which exceeds an auditory masking threshold; and applying a second set of audio parameters to the parameterised audio components to lessen the auditory masking to below the auditory masking threshold.

[0019] Optionally, the computer-implemented method may further comprise receiving a second user input from a user via the input mechanism, wherein the second user input comprises a command to one of: pause, fast-forward or rewind the audio track being presented at the audio output.

3 [0020] Optionally, the first set of audio parameters may comprise a plurality of subsets of audio parameters, each associated with a respective initial audio component, wherein applying a first set of audio parameters associated with the first user-selectable region to the plurality of initial audio components comprises applying each subset of audio parameters to the associated initial audio component.

[0021] Optionally, the plurality of initial audio components may comprise one or more of: track- length audio components, stems, lossless audio files, lossy audio files, individually recorded audio files, composites of individually recorded audio files, stereo audio files, mono audio files, pre- processed audio files, or audio files reconstructed using MIDI.

[0022] Optionally, audio parameters may include any of: tempo, key, gain, volume, pan position, equalisation, compression, limiter controls, reverb, delay, distortion, chorus, vibrato, tremolo, pitch shift, software effects, or hardware effects which perform mathematical manipulation of the audio signal.

[0023] Optionally, the computer-implemented method may further comprise applying a master set of audio parameters to the combined parameterised audio components.

[0024] Optionally, any of the sets of audio parameters may be pre-prepared, user-defined or learned based on previous user behaviour.

[0025] Optionally, the display may present a second plurality of user-selectable regions. The user-selectable regions in the second plurality of user-selectable regions may be associated with a respective initial audio component. The method may further comprise: receiving a third user input from a user via the input mechanism, wherein the third user input comprises selecting a second user- selectable region in the second plurality of user-selectable regions; receiving a fourth user input from a user via the input mechanism, wherein the fourth user input comprises selecting a third user- selectable region in the first plurality of user-selectable regions; applying a subset of audio parameters in a third set of audio parameters which are associated with the third user-selectable region to the initial audio component associated with the second user-selectable region to produce an updated parameterised audio component which corresponds to the initial audio component associated with the second user-selectable region; and responsive to receiving the third user input and the fourth user input, replacing, at the audio output, the parameterised audio component being presented at the audio output which corresponds to the initial audio component associated with the second user-selectable region with the updated parameterised audio component.

[0026] According to a second aspect, there is provided a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform any of the methods disclosed herein.

[0027] According to a third aspect, there is provided a system with an internal memory, a processor, an input mechanism, a display and an audio output, the processor configured to perform any of the methods disclosed herein.

Brief description of the drawings

[0028] The figures depict various embodiments of the present invention for the purposes of illustration and by way of example only.

4 [0029] Figure 1 A is a schematic of an example device and components thereof on which the methods disclosed herein may be performed;

[0030] Figure 1 B shows an exemplary embodiment of the device of figure 1 A including an exemplary audio output;

[0031] Figure 2 shows an exemplary user interface;

[0032] Figures 3A and 3B show a schematic composition of an audio track and a set of audio parameters respectively;

[0033] Figure 3C illustrates how a set of audio parameters may be applied to an audio track according to embodiments of the present disclosure;

[0034] Figures. 4A and 4B illustrate exemplary embodiments of the first plurality of user- selectable regions;

[0035] Figure 5 illustrates an exemplary embodiment of the first plurality of user-selectable regions;

[0036] Figure 6 illustrates an exemplary embodiment of the first plurality of user-selectable regions;

[0037] Figure 7 illustrates exemplary embodiments of the second plurality of user-selectable regions;

[0038] Figures 8A-8C and 9 illustrate flowcharts depicting various methods disclosed herein. [0039] Like reference numerals throughout the drawings related to like features but are not limited thereto. It will be understood that features illustrated by way of dashed, or otherwise non- continuous lines are to be understood to be optional features. Equally, any features illustrated by continuous lines should not be construed as being compulsory to the embodiments of the present invention.

Detailed description

Device for manipulation and playback of an audio track

[0040] Figures 1 A and 1 B show a device capable of and suitable for manipulation and playback of an audio track according to embodiments of the present disclosure.

[0041] As shown on figure 1 A, the device 100 may comprise a number of functionally defined components. The device 100 may include a processor 102, an input mechanism 104, a display 106, an audio output 108, an internal memory 110 and a network interface 118. One or more of these components may be optional. Additionally or alternatively, the function of two or more of these components may be provided by a single component. For instance, if the device 100 comprises a touch screen, the touch screen may act as both the input mechanism 104 and the display 106. The device 100 may further comprise an additional input mechanism 104, an additional display 106 and/or an additional audio output 108.

[0042] The device 100 may be, for instance, any conventional information appliance or computational device such as a smartphone, a tablet, a personal computer and/or a virtual-reality headset.

5 [0043] The processor 102 may include any type of conventional processor, digital signal processor, microprocessor, multiprocessor, or processing logic that interprets and executes instructions. The processor 102 may be configured to carry out any or all of the methods of the present disclosure.

[0044] The input mechanism 104 may include a conventional mechanism that enables the device 100 to receive commands, instructions, or other inputs from a user. The input mechanism 104 may comprise any combination of hardware and software suitable for receiving a user input and translating that user input into computer-interpretable instructions, such as, instructions interpretable by the processor 102. Example inputs capable of being received by the input mechanism 104 include: a touch on a touch screen, a gesture on a touch screen, a movement of a cursor on a display, a movement of the device 100 (for instance, a rotation of the device 100), a click of a clicker (for instance, a computer mouse), a button press, a stylus tap, a voice command, a gesture and/or location data. Accordingly, the input mechanism 104 may include a touch screen, a mouse, a trackpad, a clicker, an accelerometer, a button, a stylus, a microphone, a handset controller, a visual sensor and/or a GPS sensor.

[0045] The display 106 is configured to present a user interface to a user and may include any conventional mechanism to output a display of visual information to a user. The display 106 may be, for instance, a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a light emitting diode (LED) display, a plasma display and/or a projector screen receiving a projection from a projector. In some embodiments the input mechanism 104 may be in communication with the display 106. In some embodiments, the display 106 may comprise a touch screen display. In some embodiments, the touch screen display may act as both the display 106 and the input mechanism 104. In some of these embodiments, the device 100 may further include an additional input mechanism 104 and/or an additional display 106.

[0046] The audio output 108 may include any conventional device for outputting audio, such as internal speakers, a headphone jack (through which external headphones or speakers may be connected), wirelessly-connected external headphones, wirelessly-connected external speakers, an internal sound card, an external sound card and the like. Wirelessly-connected external headphones and wirelessly-connected external speakers may be connected to the device via, for instance, Bluetooth. An external sound card may be connected to the device via, for instance, USB-A, USB-B, USB-C and/or Thunderbolt™.

[0047] The internal memory 110 may include any conventional computer-readable medium (transitory, non-transitory or otherwise) in which computer-readable instructions are stored which, when executed by the processor 102, cause the processor to perform the method of the computer- readable instructions. Such instruction may include instructions to carry out any or all of the methods of the present disclosure. Such instruction may additionally or alternatively comprise instructions to run the music manipulation application 112, a computer application which, when run on the device 100, allows the user to access, manipulate and listen to an audio track from a library or database of audio tracks. Example computer-readable media which may comprise the internal memory 110 include any physical or logical memory device including hard disk drive (HDD) storage, solid state

6 drive (SSD) storage, random access memory (RAM), dynamic RAM, read only memory (ROM), flash memory, magnetic tape, a rigid magnetic disk and/or an optical disc.

[0048] The internal memory 110 may additionally or alternatively store content data. Content data may be stored on any computer-readable medium which comprises the internal memory 110, such as those outlined above. Content data may include music content (which may be stored in the music content storage 114) and/or audio parameter content (which may be stored in the audio parameter storage 116). In some embodiments the music content and/or audio parameter content may be provided without the dedicated music content storage 114 and/or audio parameter storage 116 locations respectively. The music content storage 114 and/or the audio parameter storage 116 may be located within the music manipulation application 112 within the internal memory 110 (as illustrated on figure 1 A), although the music content storage 114 and/or the audio parameter storage 116 are not limited thereto. In an alternative embodiment, the music content storage 114 and/or the audio parameter storage 116 may be located within the internal memory 110, separate from the music manipulation application 112 and the music manipulation application 112 may be configured to fetch music content and audio parameter content from the music content storage 114 and/or the audio parameter storage 116. The music manipulation application 112 may additionally be configured to provide the playback and playlist, functionality of a music playback application.

[0049] Music content may include one or more audio tracks. An audio track may be structured and configured in the manner disclosed herein, such as in the manner described belowwith reference to figure 3A. An audio track may be provided with an associated set of metadata. Such metadata may include, for instance, song title, artist name, genre, sonic profile, an indication of historic audio parameters which have previously been applied to the audio track, which may be user-specific, and/or an indication of previous user behaviour in relation to the audio track.

[0050] Audio parameter content (which may be stored in the audio parameter storage 116) may include one or more sets of audio parameters. A set of audio parameters may be structured and configured as disclosed herein, such as in the manner described below with reference to figure 3B. A set of audio parameters may be associated with a given audio track, or each set of audio parameters may be provided as a standalone set. A set of audio parameters may be provided with an associated set of metadata. Such metadata may include, for instance, a classification of the set of audio parameters, a sonic profile of the set of audio parameters, an indication of historic audio track(s) to which the set of audio parameters have been applied, which may be user-specific, and/or an indication of previous user behaviour in relation to the set of audio parameters.

[0051] By including metadata indicating historic audio parameters which have previously been applied to the audio track and/or indicating historic audio track(s) to which the set of audio parameters have been applied respectively, the system of the present disclosure may be configured to learn and predict future user behaviour and predict appropriate sets of audio parameters to apply to further audio tracks. This prediction may be carried out automatically via computational means (such as machine learning and/or employing neural networks) and may enable the system to provide a user with tailor made, user-specific sets of audio parameters. Accordingly, metadata of the type described herein may, allows the user to more efficiently manipulate the audio track via the input mechanism

7 104 in a desired manner. This prediction may additionally or alternatively be enabled by metadata indicating of previous user behaviour in relation.

[0052] The prediction may be carried out based on historically monitored inputs, commands, changes, modifications or selections carried out by the user. Specifically, a large array of measurable variables associated with the audio track may be monitored, such as audio profile, audio file energy content, key, tempo, time signature, dynamics, instruments, mix style, effects and so on. Accordingly, the system can construct a profile of suitable audio tracks and audio components tailor-made for a user as a function of audio parameters. The measurable variables associated with the profile of suitable audio tracks can then be compared to measurable variables associated with a new audio track. Based on this comparison, audio parameters for the new audio track can be selected. Furthermore, this prediction can be augmented by monitoring (and performing a statistical analysis on), for instance, track sequence order, time of day, general interaction levels and so on. This monitoring can segregate a set of commands based on any of these variables and therefore create a new set of comparison criteria with which to assess new musical content.

[0053] In yet another embodiment, the music manipulation application 112, the music content storage 114 and/or the audio parameter storage 116 may be located on or coupled to a remote database, accessible by the device via the network interface 118. The network interface 118 may be any conventional network interface, such as Ethernet, Wi-Fi, a local area network (LAN) port, a wide area network (WAN) port, or the like.

[0054] Figure 1 B shows an exemplary embodiment of the device of figure 1A, wherein the device 100 is a smartphone 120, the input mechanism 104 is a touch screen 122, the display 106 is the same touch screen 122, the output device 106 is a set of wireless headphones 124 communicatively coupled to the processor via a wireless connection 126. Figure 1 B is provided for illustrative purposes only and should not be construed as limiting.

User interface for manipulating an audio track

[0055] Figure 2 shows an exemplary user interface 200 for allowing a user to manipulate, via an input mechanism, an audio track being presented at an audio output (or being played out or back) according to embodiments of the present disclosure. The user interface 200 may be presented to the user at the same time as an audio track is presented at the audio output. The user interface 200 may be an interface of the music manipulation application 112. Figure 2 depicts the user interface on a touch screen of a smartphone, however other devices and displays are contemplated. The positional and geometric relationships between the respective features are set out in figure 2 for illustrative purposes only and are in no way to be construed as limiting. Similarly, the layout, shape, size and indicators of each feature on the user interface 200 are set out in figure 2 for illustrative purposes only and are in no way to be construed as limiting.

[0056] The user interface 200 provides a control surface to enable a user to manipulate the mix of a song being presented at an audio output 108 via a user input on the user interface 200. In this way, a user is able to tailor their listening experience in an efficient manner by altering the mix of a song being played back as desired. The mix of a song being played back can be altered on-the-fly,

8 while the mix is being played back. For instance, the control surface allows a user to create and listen to their own customised mix of a song (or audio track) with a customised sonic profile in an interactive manner. This customised mix may differ from an original mix of the song released by the artist or record label and therefore methods disclosed herein allow a user to experience unique mixes of a song in an efficient and accessible manner. The control surface may also enable a user to control playback of the song.

[0057] The user interface 200 may include a first plurality of user-selectable regions 202, a second plurality of user-selectable regions 204, action controls 206, undo/redo controls 208 and/or song transport controls 210. Any or all of these regions may be selectable by the user via the input mechanism. In some embodiments, one or more of these features may be omitted from the user interface.

[0058] The first plurality of user-selectable regions 202, also referred to as mix selection controls, enables a user to select a set of audio parameters, the members of which are to be applied to the selected initial audio components, thereby affecting the audio track optionally being presented at an audio output 108 (or being played out or back). The user selection of the audio parameters may be carried out via the input mechanism 104. Embodiments of and methods relating to the first plurality of user-selectable regions are described in more detail below with reference to figures 4A, 4B, 5, 6 and 8A-8C.

[0059] The second plurality of user-selectable regions 204, also referred to as initial audio component controls, may enable a user to select at least one initial audio component to which members of a set of audio parameters or a single audio parameter may be applied, thereby affecting the audio track optionally being presented at an audio output 108 (or being played out or back). The user selection of the initial audio components may be carried out via the input mechanism 104. Embodiments of and methods relating to the second plurality of user-selectable regions are described in more detail below with reference to figures 7 and 9.

[0060] The action controls 206 enable a user, via an input mechanism 104, to save the remix parameters of the manipulated audio track being presented at the audio output 108, share the manipulated audio track being presented at the output with other users of the audio application or system or navigate the menus of the music manipulation application 112. Once a user has saved the remix parameters of the audio track being presented at the audio output 108, the user may access that saved set of remix parameters at a later date.

[0061] The undo/redo controls 208 may enable a user, via an input mechanism 104, to undo (and then optionally subsequently redo) the effect of the latest user input on the first plurality of user- selectable regions 202. The undo control can be chained to undo a series of latest user inputs. For example, if a user selects, via the input mechanism, the undo control three times, the three latest user inputs on the first plurality of user-selectable regions 202 may be undone. Similar chaining functionality is provided with respect to the redo control.

[0062] The song transport controls 210 may enable a user, via an input mechanism 104, to pause, play, rewind, restart, fast forward, or skip to the end of the current audio track being presented at an audio output 108 (or being played out or back).

9 Audio track, initial audio components and audio parameters

[0063] Figures 3A and 3B show the schematic composition of an audio track 300 and a plurality of sets of audio parameters 312 respectively.

[0064] As shown on figure 3A, the audio track 300 may comprise a plurality of initial audio components 302a, 302b, 302c, 302d, 302n. In this way, the audio track 300 may be a composite or summation of the plurality of initial audio components 302a, 302b, 302c, 302d, 302n.

[0065] Each initial audio component may be saved within a directory corresponding to the audio track 300 within the internal memory 110, optionally within the music content storage 114, as an audio file in any conventional format for storing audio data on a device 100. Such formats include, for instance, uncompressed audio formats - such as waveform audio file format (WAV), audio interchange file format (AIFF), Au file format (AU) or RAW header-less pulse-code modulation (PCM)

- formats with lossless compression - such as free lossless audio codec (FLAC), Apple™ lossless audio codec (ALAC) or Windows™ media audio (WMA) lossless - or formats with lossy compression

- such as, MP3, advanced audio coding (AAC), or WMA lossy.

[0066] The plurality of initial audio components 302a, 302b, 302c, 302d, 302n may be configured to be presented simultaneously at the audio output 108 as, for instance, a mono, stereo mixdown, multi-speaker surround or spatial mixdown. The audio track 300 may include a runtime (e.g. a track-length), defined as the length of time the audio track takes to play back in its entirety. Each initial audio component may also include a runtime. The runtime of each initial audio component 302a, 302b, 302c, 302d, 302n may equal the runtime of the audio track 300, in which case each initial audio component may be considered to be a track-length initial audio component. For example, if audio track 300 has a runtime of 3:30 (3 minutes and 30 seconds), then initial audio component 302a may have a runtime of 3:30, initial audio component 302b may have a runtime of 3:30, and so on.

[0067] The audio track 300 may be presented at the audio output 108 (or, in other words, played out, or played back). Playback may be carried out by executing, by the processor 102, playback instructions stored in internal memory 110. In some embodiments, presentation of the audio track 300 at the audio output 108 may comprise the simultaneous playback, at audio output 108, of each initial audio component in the plurality of initial audio components 302a, 302b, 302c, 302d, 302n. Simultaneous playback may be such that the playback of each initial audio component begins within a first threshold time and ends within a second threshold time of the playback of each other initial audio component. The first and second threshold times may be equivalent to a human perceptible latency threshold, such as 48ms, 24ms or 12ms.

[0068] Each initial audio component may be provided with an associated reference grid, which is related to samples within the initial audio file. The associated reference grid may include markers at one or more constant positions within each initial audio component. These markers may label equivalent points within each initial audio component and may enable a reduction in latency between each initial audio component when the initial audio components are presented at the audio output 108. These markers may be provided at regular intervals, such as either at a given timestamp or

10 after at a constant number of samples within each initial audio component and may take the form of a data marker within the audio file. In some embodiments, playback instructions may include instructions to detect when the reference grid for one initial audio component is desynchronised from the reference grid for another initial audio component by greater than a threshold time (such as a human perceptible latency threshold, as above). Responsive to this detection, playback instruction may include instructions to correct for the desynchronization by, for instance, temporarily speeding up or slowing down one or more of the initial audio components or translating the audio in time by moving the pointer to the audio array.

[0069] Each initial audio component in the plurality of initial audio components 302a, 302b, 302c, 302d, 302n may be one or more of individually recorded audio files, a composite of individually recorded audio files, stereo audio files, mono audio files, pre-processed audio files, or audio files reconstructed using MIDI. In some embodiments, each initial audio component may be a stem, defined as a grouped collection of audio sources or recordings mixed to form a logical whole (and optionally post-processed). For instance, a drum kit stem may comprise a grouped collection of audio sources or recordings of each component of the drum kit individually (e.g. kick drum, snare, high tom, mid tom, floor tom, overheads, hi-hats) mixed to form the logical whole that is the entire drum kit. Similarly, a backing vocals stem may comprise a grouped collection of audio sources or recordings of each backing vocal individually mixed to form the logical whole that is the combined backing vocals. In some embodiments, a stem may be defined as a single audio source or recording. For example, a lead vocal stem may comprise solely the audio source or recording of the lead vocals or a lead guitar stem may comprise solely the audio source or recording of the lead guitar.

[0070] Figure 3A depicts an audio track 300 comprising at least 5 initial audio components 302a, 302b, 302c, 302d, 302n but other numbers of initial audio components are contemplated. In some embodiments audio track 300 may comprise 8 initial audio components. In other embodiments, audio track 300 may comprise 2, 3, 4, 5, 6, 7, 9, 10, 11 , 12, 13, 14, 15, 16 or greater than 16 initial audio components. In some embodiments, audio track 300 may comprise between a lower limit and an upper limit of initial audio components, wherein the lower limit may be any integer between 2 and 15, optionally between 2 and 8 and optionally between 4 and 7 and the upper limit may be any integer between the lower limit and 32, optionally between the lower limit and 16, optionally between the lower limit and 12 and optionally still between the lower limit and 9.

[0071] As shown on figure 3B, the system includes a plurality of sets of audio parameters 312. The number of sets of audio parameters 304a, 304b, 304m may equal the number of mixes, m, available to be applied to the initial audio components of the audio track being presented at the audio output. Each set 304a, 304b, 304m of audio parameters contains a plurality of members (subsets of audio parameters 306aa, 306ab and so on), optionally equal to the number of initial audio components 302a, 302b, 302c, 302d, 302n that comprise the audio track 300, where each of the members of the sets of audio parameters 304a, 304b, 304m is a subset of audio parameters. Set 304a may comprise subsets 306aa, 306ab, 306ac, 306ad, 306an, set 304b may comprise subsets 306ba, 306bb, 306bc, 306bd, and 306bn, and set 304m may comprise subsets 306ma, 306mb, 306mc, 306md, and 306mn. Any one subset of audio parameters 306aa, to 306mn may contain

11 members (which may comprise a number of different audio parameters or audio signal-related variables) that may be suitable for application to one or more of the initial audio components in the plurality of initial audio components 302a, 302b, 302c, 302d, 302n, as described above and herein. Accordingly, the subsets of audio parameters 306aa, 306ba, 306ca, 306da, 306ma may be applicable to initial audio component 302a, subsets of audio parameters 306ab, 306bb, 306cb, 306db, 306mb may be applicable to initial audio component 302b, and subsets of audio parameters 306an, 306bn, 306cn, 306dn, 306mn may be applicable to initial audio component 302n. Figure 3B shows 3 sets of audio parameters 304a, 304b, 304m, but other numbers of sets of audio parameters are contemplated. In some embodiments the plurality of sets of audio parameters 312 may comprise more than 3 sets of audio parameters.

[0072] An audio parameter may be defined as, for instance, an audio manipulator which, when applied to an audio file, alters the sonic properties of that audio file. Audio parameters include, for instance, audio effects (such as gain, distortion, overdrive, equalisation, compression, reverb, delay, distortion, chorus, vibrato, tremolo, pitch shift, software effects, or hardware effects which perform mathematical manipulation of the audio signal), compositional properties (such as tempo, key or time signature), mix properties (such as limiter controls, volume or pan position) or loudness relative to other initial audio components.

[0073] In some implementations, for reasons such as computer processing efficiency and sound quality, where like audio parameters are to be applied to multiple initial audio components, the multiple audio components may be grouped together and the audio parameter applied to the group, rather than being applied separately to each of the individual initial audio components. Additionally, a number of audio parameters may be applied to each initial audio component sequentially in order to improve sound quality by, for instance, ensuring that the key frequencies of an instrument are preserved. For instance, audio parameters may be applied to one initial audio component (e.g. the drums stems), then frequency domain information associated with that initial audio component may be used to calculate the optimal audio parameters for another initial audio component (e.g. the bass stem) to ensure that both stems occupy distinct spaces in the mix.

[0074] The sets of audio parameters 304a, 304b, 304m, (and optionally each subset of audio parameters 306aa, 306ab, to 306mn) may be any of pre-prepared, user-defined or learned based on previous user behaviour. When one or more of the sets of audio parameters 304a, 304b, 304m (and optionally any subset of audio parameters 306aa to 306mn) is learned based on previous user behaviour, the system may be configured to predict a suitable set of audio parameters for a given audio track. This prediction may be carried out automatically via computational means (such as machine learning and/or neural networks) and may be based on the sonic profile of audio parameters which are regularly selected by the user as a function of the metadata associated with the audio track being presented at the output (optionally the metadata associated with the genre or sonic profile of the audio track). In this way, the system may learn that a user tends to apply a specific set of audio parameters, or sets of audio parameters with a given classification or sonic profile, to audio tracks of a given genre or sonic profile (and optionally at a certain time on a certain day or days of the week). Responsive, the system may present the same set of audio parameters, or a set of audio parameters

12 with a similar classification, to the user in relation to a new audio track of the same genre or sonic profile.

[0075] Figure 3B depicts set of audio parameters 304a comprising at least 5 subsets of audio parameters 306aa, 306ab, 306ac, 306ad, 306an but other numbers of subsets of audio parameters are contemplated. In some embodiments the number of subsets of audio parameters equals the number of initial audio components within the audio track 300. In some embodiments, each subset of audio parameters may correspond to, be associated with and/or be configured to be applied to a respective initial audio component. In other embodiments, in the interest of increasing computational efficiency and reducing computational load, one set of audio parameters may be duplicated by way of a reference or pointer, but not physically reproduced as a standalone copy.

[0076] As explained above in relation to the user interface 200, a user may manipulate, for instance, the mix of a song being presented at an audio output 108 by a user input on the user interface 200. This user input may represent a command to apply a set of audio parameters 304a, 304b, 304m to the audio track being presented at the audio output or selected initial audio components of the audio track, where those initial audio components are selected using the audio component selection controls 204. For instance, the user input may comprise the selection of a user- selectable region in the first plurality of user-selectable regions 202 (or mix selection controls), where each user-selectable region in the first plurality of user-selectable regions 202 is associated with a respective set of audio parameters. Accordingly, the selection of a region in the first plurality of user- selectable regions represents a command to apply the set of audio parameters associated with that region to the audio track or the chosen initial audio components of the audio track.

[0077] Figure 3C illustrates the process of applying a set of audio parameters 304a to an audio track 300 according to embodiments of the present invention. As explained above, the number of members of each set 304a, 304b, and 304m may equal the number of initial audio components, n, within the audio track 300, and each member of each subset of audio parameters may correspond to, be associated with and/or be configured to be applied to a respective initial audio component. The application of a set of audio parameters 304a has been depicted in figure 3C as an equation for illustrative purposes only (which should not be construed as limiting), where the multiplication sign (‘X’) has been used to represent the application, combination or implementation of a set of audio parameters 304a to audio track 300 and the equals sign ( -’) has been used to illustrate the end product of that application, combination or implementation.

[0078] As shown in figure 3C, the application of a set of audio parameters 304a to audio track 300 comprising a plurality of initial audio components 302a, 302b, 302c, 302d, 302n produces a plurality of parameterised audio components 308. Each parameterised audio component 310aa, 310ab, 310ac, 310ad, 310an may result from the application of one member of the set of audio parameters 304a to an initial audio component 302a, 302b, 302c, 302d, 302n respectively. In this way applying a set of audio parameters 304a to an audio track 300 (comprising a plurality of initial audio components) may comprise applying each member of the set of audio parameters 306aa, 306ab, 306ac, 306ad, 306an to an associated or a corresponding initial audio component 302a, 302b, 302c, 302d, 302n. In other embodiments, applying a set of audio parameters 304a to an audio

13 track 300 (comprising a plurality of initial audio components) may comprise applying a single member to multiple initial audio component 302a, 302b, 302c, 302d, 302n. For instance, the parameterised audio component 310aa may result from the application of audio parameters 306aa to initial audio component 302a, parameterised audio component 310ab may result from the application of audio parameters 306ab to initial audio component 302b, parameterised audio component 310ac may result from the application of audio parameters 306ac to initial audio component 302c, parameterised audio component 310ad may result from the application of audio parameters 306ad to initial audio component 302d, parameterised audio component 310an may result from the application of audio parameters 306an to initial audio component 302n.

[0079] Figure 3C illustrates the process of applying the set of audio parameters 304a to the Audio track 300. This same process can be used to apply any of the sets in the plurality of sets of audio parameters 312 to the audio track 300. In this way, applying any set of audio parameters 304a, 304b, 304m to an audio track 300 (comprising a plurality of initial audio components) may comprise applying each member of the selected set of audio parameters to an associated or a corresponding initial audio component 302a, 302b, 302c, 302d, 302n. The resulting plurality of parameterised audio components from applying set of audio parameters 304b to the audio track 300 would be 310ba, 310bb, 310bc, 310bd and 310bn. The resulting plurality of parameterised audio components from applying set of audio parameters 304m to the audio track 300 would be 310ma, 310mb, 310mc, 310md and 310mn.

[0080] The plurality of parameterised audio components 308 could thus comprise a number of variants of initial audio component 302a, namely 31 Oaa, 31 Oba and 31 Oma, a number of variants of initial audio component 302b, namely 31 Oab, 31 Obb and 31 Omb, and a number of variants of initial audio component 302n, namely 31 Oan, 31 Obn and 31 Omn.

[0081] It is appreciated that the plurality of parameterised audio components 308 may be presented at the audio output 108 (or played out or back) in the same or a similar way as described above in relation to the audio track 300. Further, each parameterised audio component 31 Oaa, 31 Oab, 31 Oac, 31 Oad, 31 Oan, to 31 Omn in the plurality of parameterised audio components 308 may be presented at the audio output 108 (or played out or back) in the same or a similar way as described above in relation to the initial audio components 302a, 302b, 302c, 302d, 302n. For instance, in some embodiments, presentation of the audio track 300 at the audio output 108, after the audio track has been manipulated, may comprise the simultaneous playback, at the audio output 108, of each parameterised audio component 31 Oaa, 31 Oab, 31 Oac, 31 Oad, 31 Oan, ... , 31 Omn in the plurality of parameterised audio components 308. The simultaneous playback may comprise the summation of each parameterised audio components in such a way so as to be presented as a single audio signal to the audio output 108. Master audio parameters may be added to the single audio signal. The simultaneous playback may additionally be such that the playback of each parameterised audio component begins within a first threshold time, and ends within a second threshold time, of the playback of each other parameterised audio component. First and second threshold times may be equivalent to a human perceptible latency threshold, such as 48ms, 24ms or 12ms.

14 [0082] Each parameterised audio component 310aa, 310ab, 310ac, 310ad, 310an, 310mn may be provided with an associated reference grid. This associated reference grid may be similar or equivalent to the associated reference grid discussed above in relation to the initial audio components 302a, 302b, 302c, 302d, 302n and may include markers at one or more constant positions within each parameterised audio component. These markers may label equivalent points within each parameterised audio component and may enable a reduction in latency between each parameterised audio component when parameterised audio components are presented at audio output 108. These markers may be provided at regular intervals, such as after at a constant number of samples within each parameterised audio component and may take the form of a data marker within the audio file. In some embodiments, playback instructions may include instructions to detect when the reference grid for one parameterised audio component is desynchronised from the reference grid for another parameterised audio component by greater than a threshold time (such as a human perceptible latency threshold, as above). Responsive to this detection, playback instruction may include instructions to correct for the desynchronization by, for instance, temporarily speeding up or slowing down one or more of the parameterised audio components or translating the audio in time by moving the pointer to the audio array.

[0083] The parameterised audio components 310aa, 310ab, 310ac, 310ad, 310an, ..., 310mn may be configured to be presented simultaneously at an audio output in a similar or the same way as each initial audio component 302a, 302b, 302c, 302d, and 302n may be configured to be presented simultaneously at an audio output. Additionally or alternatively, the plurality of parameterised audio components 308 may, when presented simultaneously at an audio output, be configured to have substantially the same volume as the initial audio components 302a, 302b, 302c, 302d, and 302n when presented simultaneously at an audio output. To achieve this, the system may be configured to analyse the volume of the plurality of parameterised audio components 308 when presented simultaneously at an audio output, compare the volume of the plurality of parameterised audio components 308 when presented simultaneously at an audio output to the volume of the initial audio components 302a, 302b, 302c, 302d, and 302n when presented simultaneously at an audio output. This analysis may be carried out by any conventional volume analysis means, such as by comparing the root mean square (RMS) or the peak and/or trough values of the waveforms. The system may be further configured to adjust the volume of the parameterised audio components 308 to be within a threshold difference in volume from the volume of the initial audio components 302a, 302b, 302c, 302d, and 302n when presented simultaneously. This threshold may be measured in terms of Loudness Unit Full Scale (LUFS) measurements measured in terms of a moving average over a number of time intervals and may be 2 decibels, 1.5 decibels, 1 decibel, 0.5 decibels, 0.25 decibels, or 0.1 decibels. A consequence of normalising volume in this way is that the user may manipulate the audio track more seamlessly and efficiently without needing to externally and/or manually change the volume level of the device.

[0084] In some embodiments, keeping the volume of the audio track being presented at the output substantially constant may require increasing the volume of an initial audio component that comprises a relatively high signal-to-noise ratio and/or unwanted sounds (such as a recorded sound

15 of a metronome or a guide track). Such noise or unwanted sounds, when the initial audio component is presented at the output in the context of the original audio track, may be inaudible. However, on increasing the volume of that initial audio component, the noise or unwanted sounds may become audible. To counter this, noise-cancelling algorithms may be employed to remove the unwanted sounds, or a noise gate may be used to remove the noise.

[0085] In some embodiments, a number of pluralities of parameterised audio components may be saved in a directory corresponding to an audio track 300 within the internal memory 110, optionally within the music content storage 114, as an audio file in any conventional format for storing audio data on a device 100, such as those outlined above. Each plurality of parameterised audio components may represent an alternative mix of the audio track 300. Each plurality of parameterised audio components may result from applying the set of audio parameters associated with each region in the first plurality of user-selectable regions 202 respectively to the audio track 300. One set of audio parameters may comprise unitary audio parameters which, when applied to the initial audio files, leave the initial audio files unchanged. In these embodiments, the pluralities of parameterised audio components may be pre-prepared, ahead of manipulating the audio track 300 and applying the sets of audio parameters to a plurality of initial audio components (or an audio track 300) may comprise unmuting the relevant plurality of parameterised audio components or preparing the relevant plurality of parameterised audio components for presentation at the audio output 108.

Mix selection controls - mutually exclusive regions

[0086] The mix selection controls enable a user to select a mix to be applied to one or more of the initial audio components of the audio track being presented at the audio output 108 by way of a user input on the user interface. This user input may comprise the selection of a region in the mix selection controls which corresponds to a specific mix. In other words, each region in the mix selection controls may correspond to a different mix of an audio track. In the embodiments depicted on figures 4A and 4B, each region in the mix selection controls is associated with a single mix and, as such, the selection of each region respectively results in mutually exclusive mixes being played back at the audio output 108.

[0087] Figures 4A and 4B depict embodiments of figure 2’s first plurality of user-selectable regions 202 or mix selection controls. These embodiments are described with reference to embodiments comprising a touch screen as the input mechanism 104 and the display 106, although other input mechanisms and displays are possible. The embodiments depicted on figures 4A and 4B include regions (such as regions 404, 406, 408, 410 and 412) in a first plurality of user-selectable regions 400 and 414. These regions are user-selectable. Each region may be discrete from and/or may not overlap with its respective neighbouring regions. In other words, each region 404, 406, 408, 410 and/or 412 may be mutually exclusive from each other region 404, 406, 408, 410 and/or 412. It should be appreciated that the first plurality of user-selectable regions 400 depicted in figure 4A and the first plurality of user-selectable regions 414 depicted in figure 4B may be functionally and practically equivalent in most respects other than their respective presentations, layouts and geometries on user interface 200 and/or axes 402. Figures 4A and 4B depict five regions 404, 406,

16 408, 410 and 412 in the first plurality of user-selectable regions 400, 414, however other numbers of regions are contemplated, such as 2, 3, 4, 6, 7, 8, 9 or 10 regions.

[0088] As described in relation to figures 2 and 3, the first plurality of user-selectable regions 400 may enable a user to select a set of audio parameters to be applied to one or more of the initial audio components of the audio track, chosen as described in relation to Figure 7. If no initial audio components are chosen, then the system may consider all the initial audio components to be chosen. The user selection of the audio parameters may be carried out via the input mechanism. Each region 404, 406, 408, 410 and/or 412 in the first plurality of user-selectable regions 400, 414 may be associated with a respective set of audio parameters. For instance, region 404 may be associated with a first set of audio parameters, region 406 may be associated with a second set of audio parameters, region 408 may be associated with a third set of audio parameters, region 410 may be associated with a fourth set of audio parameters and/or region 412 may be associated with a fifth set of audio parameters. Each of the first, second, third, fourth and fifth sets of audio parameters may be structured in the same way as figure 3B’s set of audio parameters 304. The differences between the first, second, third, fourth and fifth sets of audio parameters may be in the audio parameters themselves. One or more sets of audio parameters may comprise unitary audio parameters which, when applied to the initial audio files, leave the initial audio files unchanged.

[0089] A user may select a set of audio parameters to be applied to the chosen initial audio components of an audio track by means of a user input via the input mechanism 104. For example, a first user input may be received from the user via the input mechanism 104. The first user input may comprise the selection of a first region in the first plurality of user-selectable regions 400 (such as region 404). For instance, in touch screen embodiments, a user may input a first user input selecting a first region in the first plurality of user-selectable regions 400, by touching, on the touch screen, the first region. Responsive to receiving the first user input, a first set of audio parameters may be applied to an audio track (i.e. the plurality of initial audio components) optionally being presented at the audio output. Applying a first set of audio parameters to an audio track may be carried out in any of the manners described in relation to figure 3C or any other method of the present disclosure and may result in a plurality of parameterised audio components. Each parameterised audio component may correspond to an initial audio component. Next, the plurality of parameterised audio components may be presented at an audio output. This may be carried out by, for instance, compiling the plurality of parameterised audio components into one parameterised audio track and/or presenting the plurality of parameterised audio components simultaneously at the audio output by, for instance, playing the plurality of parameterised audio components back in a synchronised manner or summing the parameterised audio components in such a way so as to be presented as a single audio signal to the audio output 108 (such as in the manners described in relation to figures 3A and 3C).

[0090] The first plurality of user-selectable regions 400 and/or each region 404, 406, 408, 410 and/or 412 may be defined by a range of positions on an axes 402 of an associated coordinate grid. The display 106 may comprise the associated coordinate grid. The display may display the associated coordinate grid or, alternatively, the associated coordinate grid may be a notional,

17 abstract coordinate grid and need not be displayed. The axes 402 may comprise an x-axis, a y-axis and optionally a z-axis. The associated coordinate grid may be provided on, above, underneath or in relation to the first plurality of user-selectable regions. Additionally or alternatively, the axes 402 may comprise polar coordinates such as a radial distance, a polar angle and optionally a third dimension, such as a z-axis or an azimuthal angle.

[0091] The associated coordinate grid may define each region (such as regions 404, 406, 408, 410 and 412) in the first plurality of user-selectable regions 400 and/or the boundaries thereof. For instance, region 404, depicted for exemplary purposes as a square in the centre of first plurality of user-selectable regions 400, may be defined by a range of coordinates on axes 402 (e.g. coordinates falling within the square in the centre of the first plurality of user-selectable regions). A region (e.g. 404) may be defined by a Boolean function which, when applied to coordinates on the associated coordinate grid, results in a True (or 1 ) Boolean data type for coordinates falling within region 404 and a False (or 0) Boolean data type for coordinates falling outside region 404. An alternative Boolean function may also define region 406, region 408, region 410 and region 412 in the same way.

[0092] In some embodiments, the first user input (comprising the selection of the first region) may comprise an input of positional information on the coordinate grid. The positional information may comprise coordinates which, when applied to the Boolean function defining the first region, result in a True (or 1 ) Boolean data type. Therefore, the range of coordinates which, when applied to the Boolean function defining the first region result in a True Boolean data type, sum to define the first region.

[0093] The coordinate grid may be present on display 106 and may be present without being visible. For instance, in touch screen embodiments, the coordinate grid may be present on the touch screen, located in conjunction with the first plurality of user-selectable regions. In some embodiments, axes 402 may further comprise a z-axis, wherein the z-axis coordinate of the input of positional information may be determined by, for instance, 3D touch, where the force of touch on the touch screen is correlated with the z-axis coordinate. The z-axis coordinate may additionally or alternatively be determined by touch-time, swipe movement or physical movement of the phone. [0094] For instance, a user input may comprise a touch on the touch screen. The position of this touch may be described by coordinates on the coordinate grid. These coordinates may fall within the defined range of coordinates defining a first region (such as, for instance, region 404). Accordingly, the coordinates defining the position of the touch, when applied to the Boolean function defining the first region, result in a True Boolean data type. In this way, the touch on the touch screen may comprise the selection of the first region. Similar methods may be employed for other input mechanisms, such as mouse, a trackpad, a clicker, a button or a stylus,

[0095] In some embodiments, axes 402 comprise an x-axis, a y-axis and a z-axis and the coordinate grid may be a three-dimensional coordinate grid. The three-dimensional coordinate grid may be defined as a three-dimensional coordinate grid located about and extending from the device 100. In these embodiments, the input mechanism 104 may comprise any suitable input mechanism for inputting positional information about and extending from the device 100. Such input mechanism

18 may include an accelerometer within device 100, a locatable handset controller communicatively coupled to device 100, a GPS sensor within device 100, a visual sensor configured to detect motion, a microphone or impact pads. In some of these embodiments, display 106 may comprise a display within a virtual reality (VR) headset.

[0096] For instance, in the exemplary embodiment where input mechanism 104 comprises an accelerometer, a user input may comprise moving the device 100 (e.g. translating or rotating the device) such that a reading is measured on the accelerometer. The moving of the device may be described by coordinates on a coordinate grid located about and extending from the device. These coordinates may fall within the range of coordinates defining a first region (such as, for instance, region 404). In this way, the moving the device may comprise the selection of the first region.

[0097] As explained above, in some embodiments, the internal memory 110 may comprise a number of pluralities of parameterised audio components. Each plurality of parameterised audio components may be structured in the same way as plurality of parameterised audio components 308 and may correspond to the result of applying alternate sets of audio parameters to a specific audio track (such as the audio track 300 described above). The number of pluralities of parameterised audio components may equal the number of regions in the first plurality of user-selectable regions 400 and the alternate sets of audio parameters may correspond to the sets of audio parameters associated with each region (e.g. regions 404, 406, 408, 410 and 412).

[0098] For instance, in the embodiment which comprises five regions and five sets of audio parameters, there may be five pluralities of parameterised audio components stored within internal memory, which are the result of applying the corresponding members of the five subsets of audio parameters to each initial audio file respectively. In other embodiments, four of the five pluralities of parameterised audio components may be the result of applying the corresponding members of four subsets of audio parameters to each initial audio file respectively and one of the five pluralities of parameterised audio components may be an unchanged audio track (or an audio track to which unitary audio parameters have been applied).

[0099] In some embodiments in which the internal memory 110 comprises a number of pluralities of parameterised audio components, a single plurality of parameterised audio components comprised of one of each type of parameterised audio component may be presented at the audio output at a time. The presented plurality of parameterised audio components may be the active parameterised audio components. Each member of the plurality of parameterised audio components for the chosen plurality of initial audio components may be associated with a region and applying a first set of audio parameters to the plurality of initial audio components may encompass making the plurality of parameterised audio components associated with the selected region (e.g. the first region) and the chosen plurality of initial audio components the active plurality of parameterised audio components. For instance, the plurality of parameterised audio components associated with the selected region may be made the active plurality of parameterised audio components by presenting the plurality of parameterised audio components associated with the selected region and not presenting the pluralities of parameterised audio components associated with the chosen plurality of initial audio components which are not the active copy. In some embodiments, such as those

19 depicted on figures 4A and 4B (where each region 404, 406, 408, 410 and/or 412 in the plurality of user-selectable regions 400 is mutually exclusive from each other region 404, 406, 408, 410 and/or 412) only one plurality of parameterised audio components may be the active copy at any one time, however, this disclosure is not limited thereto.

[00100] A benefit of the present disclosure is that the user may modify or alter the mix or combination of mixes of a song being played back at an audio output 108 in an on-the-fly manner, i.e. while the song is being played back. As such, the system is configured to receive any number of subsequent user inputs and adapt the mix or combination of mix versions of each of the initial audio components of the song being presented at the audio output accordingly. Each subsequent user input may be treated in the same way as the initial user input. In this way, a user may update the combination of mix versions of each of the initial audio components being played back by means of an updated user input, from a user via the input mechanism 104.

[00101] Accordingly, any time after the first user input has been received, an updated user input may be received from the user via input mechanism 104. The updated user input may comprise the input of updated positional information on the associated coordinate grid. This updated positional information may be input in the same manner as the first user input. For instance, the updated positional information may comprise coordinates on the coordinate grid. Accordingly, the coordinates of the updated positional information may fall within the same user-selectable region as the first user- selectable region, in which case the mix of the audio track or the chosen plurality of initial studio components of the track being presented at the audio output may be configured to remain unchanged. Alternatively, the coordinates of the updated positional information may fall within a different user-selectable region to the first user-selectable region. For instance, if the first user input comprises the selection of user-selectable region 404, the updated user input may comprise the selection of user-selectable region 404, user-selectable region 406, user-selectable region 408, user- selectable region 410 or user-selectable region 412.

[00102] Responsive to receiving the updated user input, the plurality of parameterised audio components being presented at the audio output may be replaced, at the audio output 308, by an updated plurality of parameterised audio components. Here, the updated plurality of parameterised audio components is the result of applying the set of audio parameters associated with the updated positional information to the plurality of initial audio components. This set of audio parameters may correspond to the set of audio parameters which are associated with the region within which the coordinates of the updated positional information may fall. Applying this set of audio parameters to the audio track or the chosen plurality of initial audio components may be carried out in the same way as the applying a first set of audio parameters to an audio track. In particularly, the updated set of audio parameters may be applied to the audio track or the chosen plurality of initial audio components in any of the manners described in relation to figure 3C or any other method of the present disclosure.

[00103] Replacing the plurality of parameterised audio components being presented at the audio output with the updated plurality of parameterised audio components may be carried out in any manner which results in the audio output presenting the updated parameterised audio components,

20 rather than the initial plurality of parameterised audio components. For instance, replacing the plurality of parameterised audio components may include the simultaneous fade-out of the previous plurality of parameterised audio components at the audio output and fade-in of the combined updated parameterised audio components at the audio output. In some embodiments the fade-out and/or fade-in may comprise any volume profile and/or shape, such as linear, logarithmic, step, exponential or S-curve. In some embodiments the volume profiles of the simultaneous fade-out and fade-in may be configured such that at no instance does the overall volume being presented at the audio output vary by more than a threshold amount from the volume of the previous plurality of parameterised audio components or the updated plurality of parameterised audio components. This threshold may be measured in terms of Loudness Unit Full Scale (LUFS) measurements measured in terms of a moving average over a number of time intervals and may be 2 decibels, 1 .5 decibels, 1 decibel, 0.5 decibels, 0.25 decibels, or 0.1 decibels. A consequence of normalising volume in this way is that the user may manipulate the audio track more seamlessly without experiencing jarring variations in track volume.

[00104] In some embodiments, keeping the volume of the audio track being presented at the output substantially constant may require increasing the volume of an initial audio component that comprises a relatively high signal-to-noise ratio and/or unwanted sounds (such as a recorded sound of a metronome or a guide track). Such noise or unwanted sounds, when the initial audio component is presented at the output in the context of the original audio track, may be inaudible. However, on increasing the volume of that initial audio component, the noise or unwanted sounds may become audible. To counter this, noise-cancelling algorithms may be employed to remove the unwanted sounds, or a noise gate may be used to remove the noise.

[00105] In some embodiments the updated user input may comprise a gesture. The gesture may include a gesture on a touch screen, the movement of a cursor on a display, the movement of a device as measured by an accelerometer within the device. The input mechanism 104 may be configured to receiving a gesture comprising a movement on the associated coordinate grid. This movement may be a movement from the first input of positional information to the updated positional information of the updated user input. This movement may comprise a movement speed. For instance, if the first input of positional information comprises first coordinates on the coordinate grid and the updated positional information comprises updated coordinates on the coordinate grid, the gesture may include a movement on the axes 402 from the first coordinates to the updated coordinates. The movement may be carried out in a measured length of time and the movement speed may equal the distance between the first coordinates and the updated coordinates divided by the measured length of time. This movement speed may be equivalent to a gesture rate. In the examples given above, the gesture speed may include, for instance, the speed of the gesture on the touch screen, the speed of the cursor on the display or the speed of the movement of the device respectively.

[00106] The plurality of parameterised audio components being presented at the audio output may be replaced with an updated plurality of parameterised audio components at an update speed. For instance, where replacing the plurality of parameterised audio components being presented at

21 the audio output further includes the simultaneous fade-out of the previous plurality of parameterised audio components and fade-in of the combined updated parameterised audio components, the simultaneous fade-out and fade-in (or, in other words, crossfade) may be carried out at an update speed.

[00107] In some embodiments the update speed may be positively correlated with the movement speed. In other words, if the movement speed is high the update speed may also be high and, if the movement speed is low, the update speed may also be low. Consequentially, the gesture rate may correlate with the rate of crossfade between the previous plurality of parameterised audio components and the combined updated parameterised audio components. This allows a user to manipulate the audio track in a more interactive, intuitive and efficient manner as the user may control crossfade rate between different mixes without the requirement for additional inputs.

Mix selection controls - envelope regions

[00108] Figure 5 depicts an alternative embodiment of figure 2’s first plurality of user-selectable regions 202 or mix selection controls. This embodiment includes regions 504, 506, 508, 510, 512, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542 544 and 546, which are user-selectable. 17 regions are depicted but any number may be included. The embodiment of figure 5 additionally includes at least two envelope regions. Each envelope region may be defined by an envelope range of positions on the associated coordinate grid. For instance, in the embodiment depicted on figure 5, four envelope regions (envelope regions 514, 516, 518 and 520) are depicted as quarter circles with radii greater than half the side-length of the mix selection controls 500 and with centres at each vertex of the mix selection controls 500 respectively. Similarly, in the embodiment depicted on figure 5, envelope region 522 is depicted as a circle with radius less than half the side-length of the mix selection controls 500 and with its centre at the centre of the mix selection control 500. Five envelope regions (depicted by dotted lines) 514, 516, 518, 520 and 522 are included in figure 5 but any number greater than or equal to two may be included. A difference between the embodiment of figure 5 and the embodiment of figure 4A and 4B is that the regions in the embodiment of figure 5 need not be mutually exclusive. Rather, each region may be a sub-region of one or more envelope regions. For instance, with reference to figure 5, region 506 may be a sub-region of envelope region 514, region 526 may be a sub-region of envelope region 514 and envelope region 516, and region 540 may be a sub-region of envelope region 514, envelope region 516 and envelope region 522.

[00109] As explained above, the mix selection controls enable a user to manipulate the mix of a song being presented at an audio output by way of a user input on the user interface. In the embodiments depicted on figures 4A and 4B each region in the mix selection controls is associated with a single mix and, as such, the selection of each region respectively results in mutually exclusive mixes being applied to the audio track or the chosen initial audio components and presented at the audio output. However, in the embodiment depicted in figure 5, each region may be associated with at least one mix. Accordingly, the selection of a region may equate to a command to apply the audio parameters pertaining to a single mix or a combination of two or more mixes to the audio track or the chosen initial audio components of the audio track being presented at the audio output 108.

22 [00110] The first plurality of user-selectable regions 500 and/or each region 504, 506, 508, 510, 512, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542 544 and/or 546 may be defined by a range of positions on the axes 502 of an associated coordinate grid. Axes 502 and the associated coordinate grid may be equivalent to axes 402 and the associated coordinate grid described in relation to figures 4A and 4B. Similarly, the first plurality of user-selectable regions 500 and/or each region 504, 506, 508, 510, 512, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544 and/or 546 may be defined by a range of positions on axes 502 in a similar or the same way as regions 404, 406, 408, 410, 412 may be defined by a range of positions on axes 402. The display 106 may comprise the associated coordinate grid. The display may display the associated coordinate grid or, alternatively, the associated coordinate grid may be a notional, abstract coordinate grid and need not be displayed. Axes 502 may comprise an x-axis and a y-axis. Optionally, axes 502 may further comprise a z-axis. The associated coordinate grid may be provided on, above, underneath or in relation to the first plurality of user-selectable regions. Additionally or alternatively, axes 502 may comprise polar coordinates by, for instance, comprising a radial distance and a polar angle. Optionally, polar coordinates may include a third dimension, such as a z-axis or an azimuthal angle. Regions 504, 506, 508, 510, 512, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544 and/or 546 may be selected by a first user input which comprises a first input of positional information on the associated coordinate grid in the same or similar way to that described in relation to figures 4A and 4B.

[00111] Here, each envelope region may be defined by a Boolean function which, when applied to coordinates on the associated coordinate grid, results in a True (or 1 ) Boolean data type for coordinates falling within that envelope region and a False (or 0) Boolean data type for coordinates falling outside that envelope region. These Boolean functions may be similar to those described in relation to figure 4, except that some coordinates on the associated coordinate grid may result in a true Boolean data type for multiple envelope regions at the same time.

[00112] Each envelope region (such as envelope regions 514, 516, 518, 520 and 522) may be associated with a set of audio parameters. For instance, envelope region 514 may be associated with a set of audio parameters, envelope region 516 may be associated with another set of audio parameters, envelope region 518 may be associated with yet another set of audio parameters, envelope region 520 may be associated with yet another set of audio parameters and envelope region 522 may be associated with yet another set of audio parameters. One or more sets of audio parameters may comprise unitary audio parameters which, when applied to the initial audio files, leave the initial audio files unchanged.

[00113] As each regions 504, 506, 508, 510, 512, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544 and/or 546 may be a sub-region of one or more envelope regions 514, 516, 518, 520 and 522, each region may be associated with the one or more set of audio parameters which are associated with the envelope region or envelope regions of which each region may be a sub-region. For instance, user-selectable region 506, which is a sub-region of envelope region 514, may be associated with the set of audio parameters associated with envelope region 514. Similarly, user- selectable region 526, which is a sub-region of envelope region 514 and envelope region 516, may be associated with the set of audio parameters associated with envelope region 514 and the set of

23 audio parameters associated with envelope region 516. Similarly still, user-selectable region 540, which is a sub-region of envelope region 514, envelope region 516 and envelope region 522, may be associated with the set of audio parameters associated with envelope region 514, the set of audio parameters associated with envelope region 516 and the set of audio parameters associated with envelope region 522.

[00114] Some user-selectable regions (such as user-selectable regions 504, 506, 508, 510 and 512) may be a sub-region of no more than one envelope region. For instance, user-selectable region 504 is a sub-region of envelope region 522, user-selectable region 506 is a sub region of envelope region 514, user-selectable region 508 is a sub region of envelope region 516, user-selectable region 510 is a sub region of envelope region 518 and user-selectable region 512 is a sub region of envelope region 520.

[00115] User-selectable regions that are sub-regions of no more than one envelope region are associated with no more than one set of audio parameters, the set of audio parameters associated with the no more than one envelope region. For instance, sub-region 506 may be associated with the set of audio parameters associated with envelope region 514, sub-region 508 may be associated with the set of audio parameters associated with envelope region 516, sub-region 510 may be associated with the set of audio parameters associated with envelope region 518, sub-region 512 may be associated with the set of audio parameters associated with envelope region 520 and sub- region 514 may be associated with the set of audio parameters associated with envelope region 522. [00116] User-selectable regions which are sub-regions of no more than one envelope region regions (such as user-selectable regions 504, 506, 508, 510 and 512 of figure 5) may be treated and/or function in the same way as the mutually exclusive user-selectable region 404, 406, 408, 410 and/or 412 of the embodiments described in relation to figures 4A and 4B and any features and/or functionalities described in relation to those figures may apply.

[00117] For instance, the first plurality of user-selectable regions 500 may enable a user to select a set of audio parameters to be applied to the audio track or the chosen initial audio components of an audio track by means of a user input via input mechanism 104. The user input and/or input mechanism may take any form disclosed herein in relation to any embodiment. For example, a first user input may be received from the user via the input mechanism 104. The first user input may comprise the selection of a first user-selectable region which is a sub-region of no more than one envelope region (such as, for instance, user-selectable region 504, 506, 508, 510 or 512) in the first plurality of user-selectable regions 500. Responsive to receiving the first user input, a first set of audio parameters may be applied to audio track or the plurality of chosen initial audio components of the track optionally being presented at the audio output. Applying a first set of audio parameters to the audio track or the chosen initial audio components of an audio track may be carried out in any of the manners described in relation to figure 3C or any other embodiment of the present disclosure and may result in a plurality of parameterised audio components. Each parameterised audio component may correspond to an initial audio component. Next, the parameterised audio components may be combined in a manner suitable for presentation at an audio output. For instance, the parameterised audio components may be compiled into one parameterised audio track and/or

24 may be presented simultaneously at the audio output by, for instance, being played back in a synchronised manner.

[00118] Some user-selectable regions (such as user-selectable regions 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544 and 546) may each be a sub-region of at least two envelope regions. Some user-selectable regions (such as user-selectable regions 540, 542, 544 and 546) may be a sub-region of three envelope regions. In some embodiments, at least one user-selectable region may be a sub-region of greater than three envelope regions (not depicted on figure 5). For instance, user- selectable region 524 is a sub-region of envelope region 514 and envelope region 522, user- selectable region 540 is a sub-region of envelope region 514, envelope region 516 and envelope region 522 and user-selectable-region 526 is a sub-region of envelope region 514 and envelope region 516.

[00119] User-selectable regions that are sub-regions of at least two envelope regions may be associated with at least two sets of audio parameters, the sets of audio parameters associated with each of the at least two envelope regions. For instance, user-selectable region 524 may be associated with the set of audio parameters associated with envelope region 514 and the set of audio parameters associated with envelope region 522, user-selectable region 540 may be associated with the set of audio parameters associated with envelope region 514, the set of audio parameters associated with envelope region 516 and the set of audio parameters associated with envelope region 522 and user-selectable region 526 may be associated with the set of audio parameters associated with envelope region 514 and the set of audio parameters associated with envelope region 516.

[00120] A user-selectable region may be associated with at least two sets of audio parameters by way of an association with a set of audio parameters which comprises a ratio of the at least two sets of audio parameters. The ratio may comprise terms corresponding to each set of audio parameters in the at least two sets of audio parameters. For instance, user-selectable region 524 may be associated with a first set of audio parameters, which may comprise a ratio of audio parameters comprising a term corresponding to the audio parameters associated with envelope region 514 and a term corresponding to the audio parameters associated with envelope region 522. Similarly, user-selectable region 540 may be associated with a second set of audio parameters and that second set of audio parameter may comprise a ratio of audio parameters which may comprise a term corresponding to the audio parameters associated with envelope region 514, a term corresponding to the audio parameters associated with envelope region 516 and a term corresponding to the audio parameters associated with envelope region 522.

[00121] User-selectable regions which are sub-regions of at least two envelope regions (such as user-selectable regions 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544 and 546) may enable a user to select the same respective member from at least two subsets of audio parameters to be applied to an audio track or each of the chosen initial audio components of an audio track by means of a user input via input mechanism 104. The user input and/or input mechanism may take any form disclosed herein in relation to any embodiment. For example, a first user input may be received from the user via input mechanism 104. The first user input may comprise the selection of a first user-

25 selectable region which is a sub-region of at least two envelope regions (such as, for instance, user- selectable regions 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544 and 546) in the first plurality of user-selectable regions 500. Responsive to receiving the first user input, a first set of audio parameters may be applied to each of the individual initial audio components in the plurality of selected initial audio components of an audio track optionally being presented at the audio output, wherein the first set of audio parameters may comprise a ratio of audio parameters comprising terms corresponding to the set of audio parameters associated with each envelope region in the at least two envelope regions. Applying a first set of audio parameters to each of the initial audio components in the audio track or the chosen plurality of initial audio components of an audio track may be carried out in any of the manners described in relation to figure 3C or any other embodiment of the present disclosure and may result in at least one plurality of parameterised audio components. Additionally or alternatively, applying a first set of audio parameters which comprises a ratio of audio parameters may include applying sets of audio parameters in the ratio of audio parameters in series or in parallel. Next, the parameterised audio components may be combined in a manner suitable for presentation at an audio output. For instance, the parameterised audio components may be compiled into one parameterised audio track and/or may be presented simultaneously at the audio output by, for instance, being played back in a synchronised manner.

[00122] When each set of audio parameters are applied in parallel, each set of audio parameters may be applied to the audio track or the chosen initial audio components of the audio track independently of one another to produce multiple pluralities of parameterised audio components, one for each set of audio parameters in the ratio of audio parameters. Each plurality of parameterised audio components may be subsequently combined ahead of being presented at the audio output 108. In some embodiments, the relative volumes of each plurality of parameterised audio components, when combined, may be equal to the ratio of audio parameters.

[00123] When each set of audio parameters are applied in series, each set of audio parameters may be applied to the audio track or the chosen initial audio components of the audio track sequentially to produce a single plurality of parameterised audio components to be presented at the audio output 108. An advantage of applying sets of audio parameters in series is an increase in audio quality. For instance, key frequencies of an instrument may be preserved and emphasised in order to maintain that initial audio component’s position in the mix where, were the audio parameters applied in parallel, the initial audio component’s position in the mix may be lost.

[00124] For instance, user-selectable region 524 may be associated with a first set of audio parameters and that first set of audio parameters may comprise a ratio of audio parameters which may comprise a term corresponding to the audio parameters associated with an envelope region (e.g. envelope region 514) and a term corresponding to the audio parameters associated with another envelope region (e.g. envelope region 522). When this first set of audio parameters is applied to the audio track or the chosen plurality of initial audio components of an audio track, a plurality of parameterised audio components which results from applying audio parameters associated with the envelope region 514 to audio track 300 may be produced alongside a plurality of parameterised audio components which results from applying audio parameters associated with the other envelope

26 region 522 to the audio track or the chosen plurality of initial audio components of audio track 300. The initial audio components are paired by type of underlying audio such as by specific instrument or vocal such that each pair comprises the two parameterised audio components for that type of initial audio component. Each of the pairs of parameterised audio components may then be combined at a volume ratio equivalent to the ratio of audio parameters, to produce a version of each type of initial audio component that is partly comprised of the version of that initial audio component associated with mix 506, and partly comprised of the initial audio component associated with the mix 504. This process is performed in parallel for each of the types of initial audio component but may additionally be carried out in series.

[00125] In some embodiments the ratio of audio parameters may be constant across each user- selectable region. Suitable ratios include: 1 :1 , 1 :2, 2:1 , 2:3, 3:2, 1 :1 :1 , 1 :2:1 , 1 :2:2, 1 :2:3 or any permutation thereof. In some of these embodiments the ratio of audio parameters may equal one. For instance, when the ratio of audio parameters equals one, the ratio may be, if the ratio comprises two terms, 1 :1 , 50:50 or 100:100, if the ratio comprises three terms, 1 :1 :1 , 50:50:50 or 100:100:100 and, if the ratio comprises four terms, 1 :1 :1 :1 , 50:50:50:50, 100:100:100:100 or the like.

[00126] In some embodiments, the relative volumes and/or ratios of each plurality of parameterised audio components may be different for each grouping of like-parameterised audio components. ‘Like-parameterised audio components’ are parameterised audio components which result from applying different sets of audio parameters to the same initial audio component (e.g. initial audio component 302a). Accordingly, the parameterised audio components which result from a given initial audio component (e.g. initial audio component 302a), may have different ratios and/or relative volumes to the parameterised audio components which result from a separate given initial audio component (e.g. initial audio component 302b). This method may enable the quality of the manipulated audio tracks (i.e. the mix of the audio track being presented at the audio output) to be increased by, for instance, enabling like-parameterised audio components which, if mixed equally, would otherwise cause auditory masking due to an undesirably high density of frequency components in particularly dense frequency ranges, to be mixed unequally. Thus, the potential for auditory masking may be mitigated.

[00127] For instance, user-selectable region 524 may be associated with a set of audio parameters associated with envelope region 514 and a set of audio parameters associated with envelope region 522. Upon the selection of user-selectable region 524, the set of audio parameters associated with envelope region 514 may be applied to the audio track and the set audio parameters associated with envelope region 522 may also be applied to the audio track. The result may be a plurality of parameterised audio components which comprises, for each initial audio component, a parameterised audio component resulting from applying the set of audio parameters associated with envelope region 514 and a parameterised audio component resulting from applying the set of audio parameters associated with envelope region 522. Next, when the parameterised audio components are combined in a manner suitable for presentation at the audio output, the volume of each parameterised audio component for each initial audio component may be weighted individually.

27 [00128] By implementing such individual weighting, the set of audio parameters associated with an envelope region 514 (for instance) and the set of audio parameters associated with another envelope region 522 (for instance) may be combined in a manner which maintains audio quality of the audio track for the listener. For instance, one initial audio component may be low frequency content-heavy (such as, a bass guitar stem, or a drum kit stem, or the like). If the set of audio parameters associated with an envelope region (e.g. 514) and the set of audio parameters associated with another envelope region (e.g. 522) were applied equally to this bass heavy audio component, the resulting manipulated audio track may lose sound quality. In particular, sound quality of the manipulated audio track may be lost due to auditory masking, where the perception of the audio track being presented at the audio output is negatively affected by the quantity of low-frequency information in the resulting manipulated audio track. For instance, the manipulated audio track may be perceived as “muddy” due to its high quantity of low frequency information. Accordingly, this low sound quality may be mitigated by reducing the volume of the resultant from the set of audio parameters associated with an envelope region 514 being applied and increasing the volume of the resultant from the set of audio parameters associated with another envelope region 522 being applied. The amount being applied may be reduced/increased by individually weighting the volume of each parameterised audio component respectively. In contrast, the sound quality of an audio track may not be diminished by the inclusion of a high quantity of high frequency information and, as such, the set of audio parameters associated with envelope region 514 and the set of audio parameters associated with envelope region 522 may be applied equally to a high frequency content-heavy audio component (such as a lead guitar stem or vocals stem). In some embodiments, applying the set of audio parameters associated with envelope region 514 and the set of audio parameters associated with envelope region 522 to high frequency content-heavy audio components may reduce the sound quality of the manipulated audio track, in which case the respective resultants may be weighted to compensate accordingly.

[00129] In some embodiments, the system maybe configured to detect which like-parameterised audio components, if presented at an audio output at an equal volume ratio or a constant volume ratio with the other parameterised audio components, would reduce the sound quality of the resultant manipulated audio track. For instance, the system may be configured to determine whether the sound quality of the manipulated audio track would be improved if the relative volumes or ratios varied for each like-parameterised audio components. This determination or detection may be carried out by spectral analysis methods of the manipulated audio track being presented at the audio output. Spectral analysis methods may include any conventional method for analysis audio in frequency space, such as Fourier analysis, fast Fourier transforms (FFT) and the like. Additional determination may be carried out by comparing the audio signals to ensure no constructive or destructive interference has become present due to phasing. Comparing audio signals also allows peaks to be checked to determine if they need to be compressed before combination.

[00130] Spectral analysis methods may include determining whether a particular frequency range is overcrowded, determining which parameterised audio components fall within the overcrowded frequency range. The relative volumes or ratios of like plurality of parameterised audio

28 components may then be adjusted to reduce the overcrowding. Determining whether a particular frequency range is overcrowded may comprise determining that the presence of sounds in that frequency range exceeds an auditory masking threshold, wherein the auditory masking threshold equals the threshold at which a human may be able to distinguish sounds from different sources within a particular frequency range. This threshold may be a function of frequency.

[00131] Additionally or alternatively, the system may be configured to apply a further audio parameter (e.g. a band-pass filter, low pass filter or other parametric EQ) to reduce the overcrowding or to carryout audio cleaning in the frequency domain. One or more of these methods may be carried out by the processor. These methods may be carried out automatically via computational means, such as, for instance, by an artificial intelligence.

[00132] Audio cleaning may be carried out in order to increase sound quality by, for instance, ensuring that key frequency ranges are not overcrowded. Such overcrowding may result in auditory mask at rate greater than an auditory masking threshold equal to the threshold at which a human may be able to distinguish sounds from different sources within a particular frequency range. This threshold may be a function of frequency. Audio cleaning may be preceded by a determination that the audio requires cleaning by spectral analysis methods or any conventional method for analysis audio in frequency space, such as Fourier analysis, fast Fourier transforms (FFT) and the like. Audio cleaning may be carried out by, for instance, applying a further set of audio parameters (e.g. a band pass filter, low pass filter or other parametric EQ) to the parameterised audio components.

[00133] Figure 6 depicts the mix selection controls 500, 600 of figure 5 on which a user input has been received at position Z. A user input may be received in any manner disclosed herein, such as via an input mechanism 104. Position Z may be defined by coordinates on an axes 602. Axes 602 may comprise an associated coordinate grid and may be equivalent to axes 402 described in relation to figures 4A and 4B or axes 502 described in relation to figure 5. Any of the functionality and or features described in relation to position Z may be generalised to any position or sub-region within first plurality of user-selectable regions 400, 414, 500 or 600.

[00134] Position Z is depicted as falling within user-selectable region 642. User selectable region 642 is a sub-region of envelope region 616, envelope region 618 and envelope region 622. As described in relation to figure 5, each envelope region may be associated with a respective set of audio parameters. Accordingly, user-selectable region 642 may be associated with each of the sets of audio parameters associated with envelope region 616, envelope region 618 and envelope region 622. Additionally or alternatively, as described in relation to figure 5, user-selectable region 642 may be associated with a first set of audio parameters, wherein the first set of audio parameters comprise a ratio of audio parameters. The ratio of audio parameters may comprise a term corresponding to the set of audio parameters associated with envelope region 616, a term corresponding to the set of audio parameters associated with envelope region 618 and a term corresponding to the set of audio parameters associated with envelope region 622.

[00135] In some embodiments, the ratio of audio parameters may be constant across user- selectable region 642. Similarly, the ratio of audio parameters may be constant across any given user-selectable region.

29 [00136] In other embodiments, the ratio of audio parameters may vary with positional information, even within a given user-selectable region. Positional information may include coordinates on the coordinate grid. Accordingly, each term within the ratio of audio parameters may vary with the distance between the positional information and the boundary of the term’s respective envelope region. Additionally or alternatively, each term within the ratio of audio parameters may vary with the distance between the positional information and the boundary or a vertex of first plurality of user-selectable regions 600. Additionally or alternatively, each term within the ratio of audio parameters may vary with the distance between the positional information and the centre of the term’s respective envelope region. Any variation of a term within the ratio may be in an inversely correlated manner, a proportional manner, a linear manner, a polynomial manner, a root mean squared manner or the like.

[00137] For instance, position Z may be a distance c from a vertex (‘C’) of the first plurality of user-selectable regions 600, a distance d from a vertex (‘D’) of the first plurality of user-selectable regions 600 and a distance z from the centre of envelope region 622 (‘A’). Position Z may additionally be a distance rc-c from the boundary of envelope region 616, a distance rD-d from the boundary of envelope region 618 and a distance rA-z from the boundary of envelope region 622. Here rc may be the distance between a vertex (‘C’) of the first plurality of user-selectable regions 600 falling on the boundary of envelope region 616 and another boundary of envelope region 616, rD may be the distance between a vertex (‘D’) of the first plurality of user-selectable regions 600 falling on the boundary of envelope region 618 another boundary of envelope region 618 and rA may be the distance between the center of envelope region 622 (‘A’) and a boundary of envelope region 622. [00138] In some embodiments the ratio of audio parameters - i.e. audio parameters associated with envelope region 622 (AP622): audio parameters associated with envelope region 614 (ARQM): audio parameters associated with envelope region 616 (ARbib): audio parameters associated with envelope region 618 (ARbib): audio parameters associated with envelope region 620 (AP620) - may be defined in terms of the equation:

[00139] In the above equation, the function (i/(x < y), 1,0) means, if x < y, output 1 , otherwise output 0 (this may be the Boolean function defining each envelope region). Accordingly, in the example given above (i.e. position Z on figure 6) the terms corresponding to ARQM and AP620 in the above equation equal 0. As such, the ratio of audio parameters for specific position Z on figure 6 equals

[00140] If position Z were to be moved such that distance c were increased, distance r_c-c would be decreased. Accordingly, in the embodiment where a term within the ratio of audio parameters may vary with the distance between the positional information and the boundary or a vertex of first plurality of user-selectable regions 600 (such as is defined by the above equation), a term within the ratio of audio parameters varies with the distance between the positional information and the boundary of the term’s respective envelope region, the term associated with envelope region 616

30 may vary with distance r_c-c. For instance, the term associated with envelope region 616 may decrease as distance r_c-c increases (in an inversely proportional manner, a negatively correlated manner, a linear manner, a polynomial manner or a root mean squared manner or the like).

[00141] Alternatively, in an embodiment where a term within the ratio of audio parameters varies with the distance between the positional information and the boundary of the term’s respective envelope region, the term associated with envelope region 616 may vary with distance r_c-c. For instance, the term associated with envelope region 616 may increase as distance r_c-c increases (in a proportional manner, a positively correlated manner, a linear manner, a polynomial manner or a root mean squared manner or the like).

[00142] The same functionality and logic also applies to distances d and z. Further, in the foregoing description distances are described in relation to position Z within user-selectable region 642, although the logic and functionality may be generalised to other positions and/or other user- selectable regions.

[00143] A consequence of varying terms in the foregoing manner is that the ratio of audio parameters being applied to the audio track or the chosen initial audio components of the audio track being presented at an output (described in relation to figure 5) may be varied as a function of user- input position in an efficient manner. By varying a single user-input, the user may vary the ratio of audio parameters being applied to the audio track or the chosen individual audio components of the audio track being presented at an output. In some embodiments, the relative volumes and/or ratios of each plurality of parameterised audio components may be varied equally for each grouping of like- parameterised audio components.

[00144] In some embodiments, the relative volumes and/or ratios of each plurality of parameterised audio components may be varied differently for each grouping of like-parameterised audio components. ‘Like-parameterised audio components’ are parameterised audio components which are resultants from applying different sets of audio parameters to the same initial audio component, as explained above in relation to figure 5.

[00145] Relative volumes and/or ratios of each plurality of parameterised audio components may be varied differently for each grouping of like-parameterised audio components by associating each grouping of like-parameterised audio components with a sensitivity, wherein the sensitivity may relate to the frequency content of the like-parameterised audio components. The sensitivity may dictate how the volumes and/or ratios of each plurality of like-parameterised audio components vary with position (such as the variation of position Z). The sensitivity may be a function of position. Example functions include any type of conventional function such as linear or polynomial functions. Example functions also include step functions. By defining the sensitivity as a step function for certain like- parameterised audio components, the relative volumes and/or ratios of the certain like-parameterised audio components may be flipped at a threshold position. This may be especially beneficial for maintaining audio quality of the audio track for the listener for similar reasons to those described in relation to figure 5. Specifically, it may be the case that one initial audio component may be low frequency content-heavy (such as, a bass guitar stem, or a drum kit stem, or the like) and it may be desirable to have an uneven mix of parameterised audio components related to that initial audio

31 components. By defining the sensitivity as a step function for these like-parameterised audio components, these initial audio components may be presented at the audio output in an uneven mix which may still be varied with position.

[00146] In any of the embodiments depicted on or described in relation to figures 3-6, the volume of the audio track being presented at the audio output may be kept substantially constant. Responsive to applying a set of audio parameters to produce a plurality of parameterised audio components, the parameterised audio components may be combined in such a way that the audio track being presented at the audio output after applying a set of audio parameters may be within a threshold volume deviation of the audio track being presented at the audio output before applying a set of audio parameters. This threshold volume deviation may be 2 decibels, 1 .5 decibels, 1 decibel, 0.5 decibels, 0.25 decibels, or 0.1 decibels, or any human-imperceptible volume deviation.

[00147] The volume of the audio track being presented at the audio output may be kept substantially constant in any conventional way. For example, a master set of audio parameters may be applied to the volume being presented at the audio output. The master set of audio parameters may include compression, level, limiting and EQ. In another embodiment the level of the audio track being presented at the audio output may be kept substantially constant by means of a level control. A consequence of normalising volume in this way is that the user may manipulate the audio track more seamlessly and efficiently without experiencing jarring variations in track volume.

[00148] In some embodiments, keeping the volume of the audio track being presented at the output substantially constant may require increasing the volume of an initial audio component that comprises a relatively high signal-to-noise ratio and/or unwanted sounds (such as a recorded sound of a metronome or a guide track). Such noise or unwanted sounds, when the initial audio component is presented at the output in the context of the original audio track, may be inaudible. However, on increasing the volume of that initial audio component, the noise or unwanted sounds may become audible. To counter this, noise-cancelling algorithms may be employed to remove the unwanted sounds, or a noise gate may be used to remove the noise, based on a signal to noise ratio being identified as above a predetermined threshold.

[00149] In order to keep the volume of the audio track being presented at the audio output substantially constant, prior to adjusting the volume level or applying a master set of audio parameters, the system may be configured to detect a deviation from the substantially constant volume. This may be carried out by any conventional volume analysis means, such as by comparing the root mean square (RMS) or the peak and/or trough values of the waveforms. This may additionally be carried out by comparison of Loudness Unit Full Scale (LUFS) measurements measured in terms of a moving average over a number of time intervals.

Initial audio component controls

[00150] The initial audio component controls 204, 700 enable a user to choose (or select) the initial audio component or components which are to be adjusted via the mix selection controls. Each region in the initial audio component controls is associated with a specific initial audio component (or stem). Selecting a region in the initial audio component controls may either mute, unmute or solo the

32 specific initial audio component associated with that region or select the specific initial audio component associated with that region. If a specific initial audio component is selected, the user may adjust the audio for that specific initial audio component via the mix selection controls 202, 400, 414, 500, 600. In this way, the user may adjust the mix of one specific audio component individually in the audio track being presented at the audio output.

[00151] Figure 7 depicts an embodiment of figure 2’s second plurality of user selectable regions 204, or initial audio component controls. These embodiments are described in reference to a touch screen as the input mechanism 104 and the display 106, although other input mechanisms and displays are possible. The embodiment depicted on figure 7, include regions 704a-704h, 706 and 708a-708h. These regions are user-selectable. Each region may be discrete from and/or may not overlap with its respective neighbouring user-selectable regions. Some or all regions in the second plurality of user selectable regions may be associated with a respective initial audio component. [00152] Multiple regions may be associated with a single initial audio component and may provide different functionality with respect to that initial audio component. For instance, regions 704a- 704h may pair with regions 708a-708h respectively (where 704a and 708a both relate to the same initial audio component, 704b and 708b both relate to the same initial audio component and so on). Here regions 704a-704h may enable a user to select (i.e. “choose”) or deselect the respective associated initial audio component to be manipulated individually and regions 708a-708h may enable a user to select, mute, unmute or solo the respective associated initial audio component. The embodiment depicted on figure 7 includes 17 regions, corresponding to 8 initial audio components, however any number of regions are contemplated. In some embodiments the second plurality of user-selectable regions 700 may comprise 2n+1 regions, where n is the number of initial audio components in the audio track being presented at the audio output.

[00153] For example, at least one user input may be received from the user via the input mechanism 104. This at least one input may comprise the selection of at least one region (such as region 704a) in the second plurality of user-selectable regions 700. In the embodiments comprising a touch screen, a user may, via an input of at least one user input, select at least one region in the second plurality of user-selectable regions 700 by touching, on the touch screen, that at least one region. The effect of this at least one input is that the initial audio file associated with the selected region or regions (e.g. region 704a) may be selected. If the initial audio file associated with the selected region or regions are already selected, the effect of this at least one input is to deselect the selected initial audio files.

[00154] A region in the second plurality of user-selectable regions, such as region 706, may enable a user to select all the initial audio components at once. In this way the user may select and/or deselect every initial audio component in an efficient manner.

[00155] Next, a subsequent user input may be receiving on the first plurality of user selectable regions 400, 414, 500 and 600 (i.e. the mix selection controls, as described above). The subsequent user input may take any form described above in relation to the first plurality of user-selectable regions 400, 414, 500 and 600 and may comprise the selection of a region in the first plurality of user-selectable regions 400, 414, 500 and 600.

33 [00156] As a consequence of receiving the above user input and subsequent user input, a member in the subset of audio parameters associated with the selected region in the first plurality of user-selectable regions 400, 414, 500 and 600 may be applied to the initial audio component associated with the selected region in the second-plurality of user selectable regions 700. This produces an updated parameterised audio component which corresponds to the initial audio component associated with the selected region in the second-plurality of user selectable regions 700. Responsive to receiving the above user input and subsequent user input, the parameterised audio component being presented at the audio output which corresponds to the initial audio component associated with the selected region in the second-plurality of user selectable regions 700 may be replaced with the updated parameterised audio component.

[00157] A user may select a region (e.g. any of regions 704a-704h, 706 and 708a-708h) in the second plurality of user-selectable regions 700 in the same way as a user may select a region in the first plurality of user-selectable regions. For instance, a user may select a region in the second plurality of user-selectable regions 700 by way of a user input via the input mechanism 104. This user input may represent a command for the system to carry out the action associated with the selected region in the second plurality of user-selectable regions.

[00158] The second plurality of user-selectable regions 700 may be associated with a coordinate grid. This coordinate grid is distinct from the coordinate grid associated with the first plurality of user- selectable regions but may be functionally identical in many or all aspects. In the same way as the coordinate grid associated with the first plurality of user-selectable regions may comprise an axes 402, 502, the coordinate grid associated with the second plurality of user-selectable regions may also comprise an axes 702.

[00159] The second plurality of user-selectable regions 700 and/or each component region 704a- 704h, 706 and 708a-708h may be defined by a range of positions on an axes 702 of the associated coordinate grid. The display 106 may comprise the associated coordinate grid. The display may display the associated coordinate grid, or the associated coordinate grid may be a notional, or abstract coordinate grid and need not be displayed on the display. The axes 702 may comprise an x-axis, a y-axis and optionally a z-axis. The associated coordinate grid may be provided on, above, underneath or in relation to the first plurality of user-selectable regions. Additionally or alternatively, axes 702 may comprise polar coordinates such as a radial distance, a polar angle and optionally a third dimension, such as a z-axis or an azimuthal angle.

[00160] The associated coordinate grid may define each region 704a-704h, 706 and 708a-708h in the second plurality of user-selectable regions 700. The associated coordinate grid may define the boundaries of each region 704a-704h, 706 and 708a-708h in the second plurality of user-selectable regions 700. For instance, regions 704a-704h, depicted for exemplary purposes as quadrilaterals at the edges of the second plurality of user-selectable regions 700, may be defined by a range of coordinates on axes 702 (e.g. coordinates falling within the square in the centre of the first plurality of user-selectable regions). Flere, a region (e.g. 704a) may be defined by a Boolean function which, when applied to coordinates on the associated coordinate grid, results in a True (or 1 ) Boolean data type for coordinates falling within region 704a and a False (or 0) Boolean data type for coordinates

34 falling outside region 704a. An alternative Boolean function may also define regions 704b-704h, 706 and 708a-708h.

[00161] Some regions in the second plurality of user-selectable regions, such as regions 708a- 708h (depicted on figure 7 as wedge-shaped controls), may enable a user to mute or solo a specific initial audio component being presented at the audio output. These regions may be selected via the input mechanism 104 under one of two input regimes, such as a short press regime or a press and hold regime. For instance, in the touch screen embodiments, a user may select a region under the short press regime by tapping the region on the touch screen. Conversely, in the touch screen embodiments a user may select a region under the long press regime by touching and holding the region on the touch screen. In other embodiments (which do not include a touch screen as the input mechanism 104), other user inputs may correspond to the two regimes.

[00162] Selecting one of regions 708a-708h in one regime, such as the short press regime, may mute the initial audio file or parameterised audio component (whichever is being presented at the audio output) associated with the selected region at the audio output 108. This has the effect of ceasing the presentation of the initial audio file or parameterised audio component associated with the selected region at the audio output 108. Similarly, selecting one of the regions 708a-708h in the other regime, such as the long press regime, may solo the initial audio file or parameterised audio component (whichever is being presented at the audio output) associated with the selected region at the audio output 108. This has the effect of ceasing the presentation of all initial audio files and/or parameterised audio components at the audio output except those which are associated with the selected region.

[00163] Responsive to the selection of any region in the second plurality of user-selectable regions 700, the volume of the parameterised audio components or audio track being presented at the audio output may be normalised. In this manner, the overall volume of the parameterised audio components or audio track being presented at the audio output may be consistent before and after selection any of the regions in the second plurality of user-selectable regions. For example, the overall volume of an entire mix may be equal to the overall volume of a soloed initial audio component (as measured in terms of LUFS measurements in terms of a moving average over a number of time intervals).

[00164] Accordingly, the system may be configured to detect when the overall volume being presented at the audio output after the selection of any region in the second plurality of user- selectable regions 700 differs from the overall volume being presented at the audio output before the selection of any region in the second plurality of user-selectable regions 700 by more than a threshold amount and to adjust the overall volume being presented at the audio output to be within that threshold. The detection may be carried out by any conventional volume analysis means, such as by comparing the root mean square (RMS) or the peak and/or trough values of the waveforms. [00165] The threshold amount may be measured in terms of Loudness Unit Full Scale (LUFS) measurements measured in terms of a moving average over a number of time intervals and may be a human perceptible threshold volume, such as 2 decibels, 1 .5 decibels, 1 decibel, 0.5 decibels, 0.25

35 decibels, or 0.1 decibels. A consequence of normalising volume in this way is that the user may manipulate the audio track more seamlessly without experiencing jarring variations in track volume. [00166] In some embodiments, keeping the volume of the audio track being presented at the output substantially constant may require increasing the volume of (or, indeed, soloing) an initial audio component that comprises a relatively high signal-to-noise ratio and/or unwanted sounds (such as a recorded sound of a metronome or a guide track). Such noise or unwanted sounds, when the initial audio component is presented at the output in the context of the original audio track, may be inaudible. However, on increasing the volume of that initial audio component, the noise or unwanted sounds may become audible. To counter this, noise-cancelling algorithms may be employed to remove the unwanted sounds, or a noise gate may be used to remove the noise.

Various methods disclosed herein

[00167] Figures 8A-8C and 9 depict illustrative flowcharts of some of the methods disclosed herein in relation to the control surface, user interface and components thereof described above. The steps of the methods are described in relation to a touch screen on a smartphone, but those skilled in the art will appreciate that methods 800, 808, 816 and 900 may be performed on other devices, such as those disclosed above. Additional steps may be present and some of the steps may be performed in parallel or in an alternate order.

[00168] Figure 8A is a flowchart illustrating a method 800 for manipulating an audio track 300 being presented at an audio output 108. The audio track 300 may comprise a plurality of initial audio components (for instance, as described above in relation to figure 3A).

[00169] According to method 800, a user input is received at step 802. This user input may take the form of a touch or a tap on a touch screen and comprises selecting a user-selectable region in a first plurality of user-selectable regions. The first plurality of user selectable regions may refer to the first plurality of user-selectable regions, or mix selection controls, 202, 400, 414, 500 or 600 described in relation to figures 2, 4A, 4B, 5 or 6. The selected user selectable region may be associated with a set of audio parameters.

[00170] Next, at step 804, the corresponding members of a set of audio parameters may be applied to each of the individual initial audio components in the plurality of selected initial audio components within the audio track 300 being presented at an audio output 108. The set of audio parameters are associated with the user selectable region selected by the user input at step 802. The set of audio parameters may be applied to the individual initial audio components in the plurality of selected (or chosen) initial audio components in any manner disclosed herein, such as that disclosed in relation to figure 3C and results in a plurality of parameterised audio components, wherein each parameterised audio component corresponds to an initial audio component. The currently selected set of audio parameters are not applied to any unselected initial audio components, which continue to be presented at audio output 108 unadjusted by the currently selected set of audio parameters.

[00171] Additionally or alternatively, in embodiments in which the internal memory 110 comprises a number of pluralities of parameterised audio components, applying a set of audio parameters to a

36 plurality of initial audio components may optionally comprise unmuting a plurality of parameterised audio components and muting other pluralities of parameterised audio components. Here, the unmuted plurality of parameterised audio components is that which result from applying, to an audio track 300, the set of audio parameters associated with the user selectable region selected by the user input at step 802.

[00172] Ahead of being presented at the audio output 108 at step 806, the system may be configured to determine that the parameterised audio components being presented at the audio output contain auditory masking which exceeds an auditory masking threshold (not depicted on figure 8A). The auditory masking threshold may be equal to the threshold at which a human may be able to distinguish sounds from different sources within a particular frequency range. This threshold may be a function of frequency. This determination may be carried out by spectral analysis methods or any conventional method for analysis audio in frequency space, such as Fourier analysis, fast Fourier transforms (FFT) and the like. Responsive, the method may remove or lessen this auditory masking (e.g. to below the auditory masking threshold) by applying a further set of audio parameters (e.g. a band-pass filter, low pass filter or other parametric EQ) to the parameterised audio components. [00173] Finally, at step 806, the parameterised audio components (or the unmuted plurality of parameterised audio components) are presented at the audio output 108. The parameterised audio components may be presented at the audio output until playback of the parameterised audio components reaches completion or the user inputs a command to pause or stop playback.

[00174] Figure 8B is a flowchart illustrating method 808 which presents an alternative to steps 804 and 806 of method 800. These alternative steps may be present in the embodiment where the user input of step 802 comprises selecting a user-selectable region which is a sub-region of a first envelope region, a second envelope region and optionally a third envelope region, as described herein in relation to figure 5 and figure 6.

[00175] According to method 808, at step 804, sets of audio parameters associated with the first envelope region, the second envelope region and optionally the third envelope region may be applied in parallel at steps 810a, 810b and 810c respectively to the individual audio components which comprise the plurality of selected initial audio components within the audio track 300 being presented at an audio output 108. Applying each set of audio parameter may be carried out in the same manner as disclosed in relation to step 804 of method 800, or in any manner disclosed herein.

[00176] Responsive to steps 810a, 810b, and 810c being carried out, the resulting parameterised audio components from applying sets of audio parameters associated with the first envelope region, the second envelope region and optionally the third envelope region respectively may be combined at step 812. This combination may comprise weighting the volume of each parameterised audio component in accordance with the ratio of audio parameters associated with the selected user- selectable region. As discussed in relation to figures 5 and 6, this ratio may be a function of the user input’s positional information and need not be equivalent for each like-parameterised audio component. Combining parameterised audio components may further comprise normalising the volume being presented at the audio output 108, as discussed herein.

37 [00177] Finally, the combined parameterised audio components may be presented at the audio output at step 814, as disclosed in relation to step 806 of method 800.

[00178] Figure 8C is a flowchart illustrating method 816 which is an extension of method 800, further comprising receiving an updated user input at step 824 and replacing parameterised audio components being presented at the audio output at step 826. Method steps 818, 820 and 822 may be equivalent to method steps 802, 804, and 806 of method 800 respectively.

[00179] At step 824, method 814 further comprises receiving an updated user input. The updated user input may take the form of a touch or a tap on a touch screen and comprises selecting an updated user-selectable region in a first plurality of user-selectable regions. The first plurality of user selectable regions may refer to the first plurality of user-selectable regions, or mix selection controls, 202, 400, 414, 500 or 600 described in relation to figure 2, 4A, 4B 5 or 6. The updated user selectable region may be associated with a set of audio parameters.

[00180] Responsive to receiving an updated user input at step 824, the plurality of parameterised audio components being presented at the audio output may be replaced with an updated plurality of parameterised audio components at step 826. The updated plurality of parameterised audio components may be the result of applying a set of audio parameters associated with the updated user-selectable region.

[00181] Figure 9 is a flowchart illustrating a method 900 for manipulating a single (or at least one) initial audio component of an audio track 300 being presented at an audio output 108.

[00182] According to method 900, a user input is received on the initial audio component controls (i.e. the second plurality of user-selectable regions 700, 204) at step 902. This user input may take the form of a touch or a tap on a touch screen and comprises selecting a user-selectable region in a second plurality of user-selectable regions. The second plurality of user selectable regions may refer to the second plurality of user-selectable regions, or initial audio component controls, 204 or 700 described in relation to figures 2 or 7. The selected user-selectable region may be associated with an initial audio component and the selection of the selected user-selectable region may equate to the selection of an initial audio component

[00183] Next, a user input is received on the mix selection controls (i.e. the first plurality of user- selectable regions 202, 400, 414, 500, 600) at step 802. This user input may take the form of a touch or a tap on a touch screen and comprises selecting a user-selectable region in a first plurality of user- selectable regions. The selected user-selectable region may be associated with a set of audio parameters.

[00184] Next the subset of audio parameters from the set of audio parameters associated with the selected user-selectable region may be applied to the selected initial audio component(s) at step 908. The set of audio parameters may be applied to the plurality of initial audio components in any manner disclosed herein, such as that disclosed in relation to figure 3C or described above in relation to figures 8A-8C and results in an updated parameterised audio component which corresponds to the initial audio component associated with the second user-selectable region.

[00185] Finally, at step 910, the parameterised audio component being presented at the audio output 108 which corresponds to the selected initial audio component may be replaced with the

38 updated parameterised audio component. A result of this step is that the previous parameterised audio component stops being presented at the audio output 108 and, in its place, the updated parameterised audio component is presented at the audio output 108.

[00186] It will be understood that the features described in connection with one or more exemplary embodiments may be combined with features described in connection with other embodiments. Moreover, components described herein may be substituted for structurally similar or functionally equivalent components. Such modifications will be understood to fall within the scope of the present disclosure.

[00187] It will additionally be understood that certain terminology is used in the previous detailed description for convenience and is not limiting. The terms ‘a’, ‘an’ and ‘the’ should be read as meaning ‘at least one’. The term ‘comprising’ and ‘including’ will be understood to mean ‘including but not limited to’ such that systems or method comprising a particular feature or step are not limited to only those features or steps listed but may also comprise features or steps not listed.

[00188] It will also be appreciated by those skilled in the art that modifications may be made to the exemplary embodiments described herein without departing from the invention. Structural features of systems and apparatuses described herein may be replaced with functionally equivalent parts. Moreover, it will be appreciated that features from the embodiments may be combined with each other without departing from the disclosure.

39

Claims

Claims What is claimed is:

1 . A computer-implemented method for manipulating, via an input mechanism, an audio track being presented at an audio output, the audio track comprising a plurality of initial audio components configured to be presented simultaneously at the audio output, the input mechanism in communication with a display, the display presenting a first plurality of user-selectable regions, each user-selectable region in the first plurality of user-selectable regions being associated with a respective set of audio parameters, the method comprising: receiving a first user input from a user via the input mechanism, wherein the first user input comprises selecting a first user-selectable region in the first plurality of user- selectable regions; applying a first set of audio parameters associated with the first user-selectable region to the plurality of initial audio components to produce a plurality of parameterised audio components, wherein each parameterised audio component corresponds to an initial audio component; and presenting the parameterised audio components at the audio output.

2. The computer-implemented method of claim 1 , wherein each user-selectable region is defined by a range of positions on an x-axis and a y-axis of an associated coordinate grid, wherein the first user input comprises a first input of positional information on the associated coordinate grid, and wherein selecting a first user-selectable region is based on the input of positional information.

3. The computer implemented method of claim 2, wherein each user-selectable region is further defined by a range of positions on a z-axis of the associated coordinate grid.

4. The computer implemented method of claim 2 or 3, wherein the display comprises the associated coordinate grid.

5. The computer-implemented method of any of claims 2 to 4, wherein the first user-selectable region is a sub-region of a first envelope region and a second envelope region, wherein the first envelope region is defined by a first envelope range of positions on the associated coordinate grid and the second envelope region is defined by a second envelope range of positions on the associated coordinate grid, wherein the first envelope region and the second envelope region overlap, each of the first and second envelope regions being associated with a respective set of audio parameters, and wherein the first set of audio parameters comprises a ratio of audio parameters, the ratio of audio parameters comprising a term corresponding to the audio parameters associated with the first envelope region and a term corresponding to the audio parameters associated with the second envelope region.

6. The computer-implemented method of claim 5, wherein the first user-selectable region is additionally a sub-region of a third envelope region, wherein the ratio of audio parameters further comprises a term corresponding to the audio parameters associated with the third envelope region.

7. The computer-implemented method of claim 5 or 6, wherein the ratio of audio parameters is a constant across the first user-selectable region.

8. The computer-implemented method of any of claims 5 to 7, wherein the ratio of audio parameters equals one.

40 The computer-implemented method of claim 5 or 6, wherein the ratio of audio parameters varies with the positional information of the first input of positional information. The computer-implemented method of claim 9, wherein the term corresponding to the audio parameters associated with the first envelope region is inversely correlated with a distance between the positional information and a boundary of the first envelope range of positions on the associated coordinate grid, and the term corresponding to the audio parameters associated with the second envelope region is inversely correlated with a distance between the positional information and a boundary of the second envelope range of positions on the associated coordinate grid. The computer-implemented method of claim 9 or 10, wherein the term corresponding to the audio parameters associated with the first envelope region is proportional to a distance between the positional information and a center of the first envelope range of positions on the associated coordinate grid, and the term corresponding to the audio parameters associated with the second envelope region is proportional to a distance between the positional information and a center of the second envelope range of positions. The computer-implemented method of any of claims 2 to 11 , further comprising: receiving an updated user input from a user via the input mechanism, wherein the updated user input comprises updated positional information on the associated coordinate grid; replacing, at the audio output, the plurality of parameterised audio components being presented at the audio output with an updated plurality of parameterised audio components, wherein the updated plurality of parameterised audio components is the result of applying a set of audio parameters associated with the updated positional information on the coordinate grid to the plurality of initial audio components. The computer-implemented method of claim 12, wherein the input mechanism is configured to receive a gesture, wherein the gesture comprises a movement on the associated coordinate grid from the first input of positional information to the updated user input, the movement comprising a movement speed, wherein the plurality of parameterised audio components being presented at the audio output are replaced with an updated plurality of parameterised audio components at an update speed, the update speed being positively correlated with the movement speed, optionally wherein the input mechanism comprises at least one of: a touch screen, the gesture comprising a gesture on the touch screen; or a cursor on the display, the gesture comprising the movement of the cursor. The computer-implemented method of any of claims 1 to 14, wherein the input mechanism comprises at least one of a touch screen, a mouse, a trackpad, a clicker, an accelerometer, a button, a stylus, a microphone, a handset controller, a visual sensor and/or a GPS sensor. The computer-implemented method of any preceding claim, wherein applying a first set of audio parameters comprises applying a first set of audio parameters to the plurality of chosen initial audio components. The computer-implemented method of any preceding claim, further comprising: determining that the parameterised audio components being presented at the audio output contain auditory masking which exceeds an auditory masking threshold; and applying a second set of audio parameters to the parameterised audio components to lessen the auditory masking to below the auditory masking threshold. The computer-implemented method of any preceding claim, further comprising receiving a second user input from a user via the input mechanism, wherein the second user input comprises

41 a command to one of: pause, fast-forward or rewind the audio track being presented at the audio output. The computer-implemented method of any preceding claim, wherein the first set of audio parameters comprises a plurality of subsets of audio parameters, each associated with a respective initial audio component, wherein applying a first set of audio parameters associated with the first user-selectable region to the plurality of initial audio components comprises applying each subset of audio parameters to the associated initial audio component. The computer-implemented method of any preceding claim, wherein the plurality of initial audio components comprise one or more of: track-length audio components, stems, lossless audio files, lossy audio files, individually recorded audio files, composites of individually recorded audio files, stereo audio files, mono audio files, pre-processed audio files, or audio files reconstructed using MIDI. The computer-implemented method of any preceding claim, wherein audio parameters include any of: tempo, key, gain, volume, pan position, equalisation, compression, limiter controls, reverb, delay, distortion, chorus, vibrato, tremolo, pitch shift, software effects, or hardware effects which perform mathematical manipulation of the audio signal. The computer-implemented method of any preceding claim, further comprising applying a master set of audio parameters to the combined parameterised audio components. The computer-implemented method of any preceding claim, wherein any of the sets of audio parameters are pre-prepared, user-defined or learned based on previous user behaviour. The computer-implemented method of any preceding claim, the display presenting a second plurality of user-selectable regions, the user-selectable regions in the second plurality of user- selectable regions being associated with a respective initial audio component, the method further comprising: receiving a third user input from a user via the input mechanism, wherein the third user input comprises selecting a second user-selectable region in the second plurality of user-selectable regions; receiving a fourth user input from a user via the input mechanism, wherein the fourth user input comprises selecting a third user-selectable region in the first plurality of user- selectable regions; applying a subset of audio parameters in a third set of audio parameters which are associated with the third user-selectable region to the initial audio component associated with the second user-selectable region to produce an updated parameterised audio component which corresponds to the initial audio component associated with the second user-selectable region; and responsive to receiving the third user input and the fourth user input, replacing, at the audio output, the parameterised audio component being presented at the audio output which corresponds to the initial audio component associated with the second user-selectable region with the updated parameterised audio component. A computer readable medium, comprising instructions which, when executed by a processor, cause the processor to perform the method of claims 1 -23. A system with an internal memory, a processor, an input mechanism, a display and an audio output, the processor configured to perform the method of claims 1 -23.

42