GB2581319A

GB2581319A - Automated music production

Info

Publication number: GB2581319A
Application number: GB1820266.3A
Authority: GB
Inventors: Philip Newton-Rex Edmund; Trevelyan David; Nicholas Chanquion Pierre; Cooper Jonathan; James Steward Robert; Andrew Storey Jason
Original assignee: ByteDance Inc
Current assignee: ByteDance Inc
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2020-08-19
Anticipated expiration: 2038-12-12
Also published as: WO2020121225A1; GB201820266D0; GB2581319B

Abstract

A computer implemented method, preferably using artificial intelligence, of rendering music into audio format comprising receiving user-defined music production parameters such as tempo, duration and musical intensity and using them to produce a custom music piece in digital musical notation format (e.g. MIDI). The custom music piece is rendered into audio format for output to the user. Prior to the rendering step being completed, a preview audio render is created using pre-generated music segments stored in audio format. The segments have been generated by producing multiple sections of music according to different predetermined music production parameters. The segments are stored with associated metadata indicating the production parameters used to produce them and the preview audio render is created by matching sections of the custom music piece to different ones of the pre-generated music segments, based on the user-defined production parameters and the metadata, and sequencing the selected segments. A user device for creating an audio render of a custom music piece, involving downloading pre-generated music segments from a remote system, is also disclosed.

Description

AUTOMATED MUSIC PRODUCTION

Technical field

This disclosure relates to automated music production, in which music is produced in digital musical notation format and rendered therefrom into audio format.

Background

Automated music production based on artificial intelligence (AI) is art emerging technology with significant potential. Research has been conducted into training AI systems, such as neural networks, to compose original music based on a limited number of input parameters. Whilst this is an exciting area of research, many of the approaches developed to date suffer from problems of flexibility and quality of the musical output, which in turn limits their usefulness in a practical context.

An Al music production system provided under the name Jukedeck offers users significantly more choice and flexibility in the control they have over the production of AI music. which extends for example to control over its composition and/or arrangement by sophisticated At engines. Users can cause tracks in multiple musical styles and of high musical quality to be generated according to customised music production parameters to which drive various aspects of the AI composition and arrangement processes in a simple. effective and intuitive manner.

Summary

In the context of a music production system ia which a custom piece of music (track) is produced in digital music notation format according to user-defined music production parameters, and then rendered into audio format for outputting to the user, the rendering process in particular can cause a significant delay from the time at which the user provides the music production parameters and the dine at which the audio render becomes available for listening. A significant factor is the computing resources that are required to render digital music (i.e. music in digital musical notation format) into a self-contained audio render. in order to achieve a final audio render of high musical quality, typically multiple virtual instruments (synthesisers) are used in conjunction with various digital effects processing in order to achieve an overall musical effect. Careful musical variation is generally needed to achieve the desired musical effect which typically means introducing some form of time-modulation into the settings applikti by the virtual instruments, effects etc. (referred to as automation). This generally means that, in order to facilitate customisation. custom tracks will have to be rendered from "scratch" by driving the virtual instruments and digital effects individually according to events encoded in the digital track to he rendered. This is time consuming as a consequence of the significant computational resources that are required. For example, it may take several minutes to fully render a digital track of even medium complexity into audio format. l0

The present invention allows a "preview" audio render of a digital track to be generated extremely quickly before a "full" audio render of that track is available. The full audio render is generated in the manner described above, whereas the preview track is an approximation created using pre-generated music segments stored in audio format. Audio data of the pre-generated music segments are sequenced in order to provide a reasonable and informative approximation of the final audio render before the generation of the latter has completed (and, in some embodiments, before it has even begun).

A first aspect of the present invention provides a computer-implemented method of rendering music into audio format, the method comprising: receiving, at a music production system, one Or more music production parameters defined by a user for producing a custom piece of music; using the user-defined music production parameters to produce a custom piece of music in digital musical notation format; rendering the custom piece of music into audio format for outputting to the user; and creating, for outputting to the user before the rendering of the custom piece of music has completed, a preview audio render thereof, using pr generated music segments stored in audio format, the music segments having been generated by producing multiple sections of music according to different predetermined music production parameters, and rendering the multiple sections of music into audio format. The pre-generated music segments are stored in association with production metadata indicating the predetermined music production parameters used to produce them. The preview render is created by matching sections of I he custom piece of music to different ones of the pre--generated music segments, based on the user-defined music production parameters and the production metadata, and sequencing audio data of the different pre-generated music segments, the preview render comprising the sequenced audio data.

For the avoidance of any doubt, it is noted that, unless otherwise indicated, the term "production" is used in a broad sense to refer to any aspect or aspects of the music creation process, including composition, arrangement, performance, audio mixing etc. This applies equally to terms such as music production parameters and production metadata. Thus, for example, the user-defined music production parameters may be high-level parameter(s) such as duration andlor "sync point" timing (see below); whereas the predetermined music production parameters used to generate the music segments may low-level, generally musical in nature, and are not exposed or visible to the user (e.g. a choice of neural networks to use for machine learning composition, choice of musical parts to use etc.). Where the user provides high-level music production parameters, in certain embodiments, lower-level music production parameters may be determined in dependence on these (such as musical parts for different sections), which can be used to produce the custom piece of music and which can also be compared to corresponding low-level parameters used to generate the music segments as indicated by the production metadata, in order to carry out thc above matching.

The step of rendering the custom piece of music into audio format may be instigated in response to a full render request received from the user after the preview render has been made available for outputting to the user. This provides the user with an opportunity to listen to the preview, before deciding whether or not to instigate the full rendering process (for example, having listened to the preview render, he may decide to make further changes to the music production parameters). This is beneficial in term of system resources, as it prevents resources from being consumed in generating full audio renders that arc unwanted.

In embodiments at least two of the pm-generated music segments may differ in at least one of the following respects: musical dynamics, duration, tempo, melodic content, musical parts, combination of musical parts, musical function of musical parts; instrument or sound characteristics, or melodic content. That or those differences may he indicated by the music production metadata.

The music segments may have been generated initially in digital musical notation format and rendered therefrom into audio format.

For example, at least two of the music segments in digital musical notation format may have different: automation settings, dynamics settings, instrument or sound settings, symbolic music data, digital effects settings. That or those differences may he indicated by the music production metadata.

The one or more music production parameters comprise at least one of: a duration for the custom piece of music, and a timing for a musical event (such as a concentration of musical intensity) within the custom piece of music -which may be referred to herein as a "sync point'.

The one or more user-defined musk production parameters may he processed by at least one music production component of the music production system so as to autonomously determine. one or more further music production parameters based thereon, wherein the one or more autonomously-determined music production parameters are used to produce the custom piece of musk, and the sections of the custom piece of music are matched to said different ones of the pre-generated music segments by comparing the autonomously-determined music production parameters with the production metadata.

For example, the one or more autonomously-determined music production parameters may comprise at least one of: one or more musical part parameters for the custom piece of music, one or more section parameters defining the sections of the custom piece of music (e.g. their durations and/or relative ordering), and one or more composition settings for the custom piece of music (e.g. for selecting, from a set of available probabilistic sequence models of a composition engine, one or more probabilistic sequence models for autonomously composing music for at least one section of the custom piece of music).

For example, the one or more user-defined musk productions parameters may indicate one or more user requirements (such as a a user-defined track duration and/or one or more sync points (each) having a user-defined timing), and further music production parameter(s) may be determined for producing the custom piece of music in a way that conforms to the user-defined duration and/or sync point(s) (the user's requirements). The further music production parameter(s) may for example define the sections of the custom piece of music (e.g. their ordering and/or duration) ardor one or more musical attributes of each section (e.g, its musical parts, a musical function of musical parts, its musical type such as verse, chorus, middle eight etc.) in a way that satisfies the user requirements and makes sense musically.

The music production component may for example be an artificial intelligence music production COlnponent.

Further or alternatively, the sections of the custom piece of music may be matched to said different ones of the pre--generated music segments by comparing the user-defined music production. parameters with the production metadata.

The custom piece of music may exhibit musical variations determined from the music production parameters, and the sections are matched to the pre-generated music segments to approximate the musical variations in the preview render.

The musical variations may he determined from an intensity curve defined by the music production parameters and the sections are matched to the pre-generated music segments by matching a portion of the intensity curve in each section to at least one of the pre-generated music segments.

At least one of the music segments may have been generated with a flat intensity curve having a single intensity value across a duration of that music. segment. Further or alternatively, at least one of the music segments may have been generated with a time-varying intensity curve having different intensity values within a duration of that music segment.

The musical variations arc introduced by varying at least one of the following, dynamic settings, automation settings. tempo, composition settings, musical parts, a musical function of at least one musical part, and instrument or sound settings.

The custom piece of music may be a custom arrangement of an existing piece of music and the music production parameters may comprise one or more arrangement parameters for determining the custom arrangement.

The custom arrangement may be an arrangement pre-determined music elements.

At least one of the predetermined music elements may be re-composed automatically, by a composition engine of the music production system, to fit the custom arrangement.

The music production parameters may comprise one or more composition parameters and a composition engine of the music production system may he caused to autonomously compose at least one music element for use in at least one section of the custom arrangement based on the composition parameters.

The custom piece of music may be produced by determining section templates for the..sections of the custom piece of music to be produced, and multiple pre-generated music segments may be associated with each section template for matching to a section having that section template.

Each section template may comprise at least one of an upper intensity limit and a lower intensity limit. A section template may be selected for each section of the custom piece of music by comparing the portion of the intensity curve in that section with the intensity limit(s) of that section template, and the music segments associated with each section template may he generated within the intensity limit(s) of that section template.

The section template may define at least one of: musical parts, a musical function of at least one musical part. a mapping of intensity curve values to automation settings.

The music production parameters may comprise one or more performance parameters which are used to introduce performance Variation into at least one section of the custom arrangement.

Another aspect of the invention provides:a particularly efficient mechanism for creating an audio render of a custom piece of music, in which an overall music production process is implemented in a distributed fashion between a local user device and a remote music; production system in communication therewith.

According to this aspect, a user device for creating an audio render of a custom piece of music comprises: a network interface for communicating with a remote music production system: a user interface for receiving user inputs from a user of the user device; memory; one or more processors configured to execute computer-readable music production code which is configured, when executed, to cause the one or more process to carry out the following operations: downloading from the remote music production system and storing in the memory of the user device a set of pre-generated music segments in audio format; processing user inputs received at the user interface to determine at least one music production parameter for producing a custom piece of music; generating and transmitting to the music production system at least one electronic message comprising the at least one music production parameter; receiving from the music production system arrangement instructions for creating at the user device an audio render of the custom piece of music; and sequencing audio data of at least two of the pre-generated music segments according to the arrangement instructions, and thereby creating an audio render of the custom piece of music comprising the sequenced audio data.

A further aspect of the invention provides a computer program product comprising computer-readable code stored on a non-transitory computer readable storage medium, which is configured, when executed on one or more processors, to carry out any of the above operations.

Brief Description of Figures

For a better understanding of the present invention, and to show how embodiments of the same may be carried into effect, reference is made to the following figures in which: Figure I shows a schematic block diagram of a music production system; Figure. 2 shows how an incoming job request may be handled by a music production system; Figure 3 shows a high-level overview of a music production system with the core system components arranged in a stack; Figure 4 shows a schematic block diagram of a composition engine: Figure 5 illustrates one example architecture of a composition engine tor generating music segments for multiple musical parts; Figure 6 shows a flow chart for a method of generating a track in response to a request from a 30 user; Figure 7 shows a schematic illustration of a possible structure of a settings database; Figure 8 illustrates a hierarchical selection mechanism for selecting track settings; Figure 9 shows a schematic block diagram of an application programming interface; Figure 10 shows a flow diagram illustrating a method of editing a musical track; Figure 11 shows a schematic block diagram of a music production system which incorporated preview rendering functions: Figure 12 shows a schematic illustration of a track audio clip pack; Figure 13 illustrates by example the principles by which musical variations may he approximated in a preview render; Figure l4 illustrates by example the principles by which a preview render may be created: Figure 15 shows a functional block diagram of a preview rendering component: and Figure 16 shows a system architecture in which preview rendering functions may be implemented effectively.

Detailed Description of Example Embodiments

In the described embodiments, music is pit-generated in a music production system in a way that can he arranged subsequently in a customizable fashion. A pre-generated piece of music may be referred to herein as a track or song. A track comprises a set of pre-generated music segments in digital musical notation format that can he arranged in different ways.

When pre-generating this music, for each track, an "audio clip pack" is created -that is, a number of musical segments, but in audio format (audio clips) that can be combined later. With each segment, metadata describing the segment is stored (production metadata). That. metadata describes how the segment is permitted to be used in a musical arrangement.

A user can subsequently select any pre-generated track via a front-end interface provided by the music production system (such as a website, API etc.). Once selected, they can input certain user-set parameters, such as (i) track duration and (ii) point(s) at which the music should climax (sync points).

These user-set parameters are passed to a back-end of the music production system, where an arrangement is automatically created that satisfies the parameters. The metadata associated with each segment -which describes how that segment can be used in the arrangement -is, at this point, used to create an arrangement that uses the segments in ways they are permitted to he used. In this context, the term "arrangement" refers to a set of instructions for sequencing audio data of selected audio clips (preview rendering instructions 132 in Figure 11 -see below).

In the described examples, this is a two-step process in which an arrangement envelope is generated based on the user-set parameters, and then a selection of the audio clips is made, using the associated metadata, to fit the arrangement envelope using the associated metadata. The musical data indicates properties of each audio clip such as its constituent musical parts, their musical function, the clip's duration and/or musical intensity etc. In the described example, the musical pails and their musical function is indicated by way of an association --that forms part of the metadata between the audio clip and a "section template".

More generally, the user-set parameters and the metadata are used in combination to select audio clips for the preview arrangement and determine an order in which the selected audio clips are to he sequenced. That is. the metadata is used to determine an order in which audio data of the audio clips are to be sequenced, so that the clips are not used in orders that don't work musically.

The requisite segments from the audio clip pack arc then sequenced in the order the arrangement dictates. This provides a preview of the track that the user can listen to virtually immediately.

When the user is satisfied with their preview. they can request a full render. At this point, the arrangement is used to re--render the entire track (using the music segments in digital musical notation format). instead of simply scquincing the audio clips together. The re-rendered entire track is then delivered to the user. However, in general, the fully-rendered track can be generated in the background, or afterwards, or at some other point.

Reference is made by way of example to United Kingdom Patent Application Numbers 1721215.0, 17212123, 1721216.8 and 1802182.4, each of which is incorporated herein by reference in its entirety. There is disclosed therein an AI music production system which is one example of a context in which embodiments of the invention may he implemented. It is however noted that the invention is not limited in this respect and may be implemented in other contexts. A concise description of the relevant architecture of that music production system is given below in order to provide context to the preferred embodiments of the present invention that are described later. Further details may he found in the referred-to applications.

The Al music production system can use Al to compose and/or produce original music. A feature of the above AI music production system is an application programming interface (API) that gives developers access to the full power of the Al composition and production system, allowing a user to automatically create professional quality, customised music at scale. It is noted however that the invention is not contingent on the provision of an An. An API is one mechanism by which a user can provide custom music production parameters (composition, arrangement etc.) for processing within the music production system. However, in general these can he provided by any suitable mechanism (such as a Web interface). It will therefore be appreciated that all descriptions below in relation to the API applies equally to a context in which the music production parameters are provided by some other means.

The described API is an AN for audio and MIDI. That is, with the API a user can generate both audio files and their underlying compositions in NIIIM format. This description focuses on the audio generation aspects, which allow a user to: * Select by style, tempo and duration * Choose from a pre-generated audio library or generate bespoke tracks * Edit existing tracks * Generate P3 and WANT output A broad range of applications can he powered using the API, including video creation, games, music making, generating music to accompany visual or other content in a variety of contexts, podcasting and content automation.

Some of the benefits include the ability to: empower the user's creative experience with single-click, personalised music, increase user engagement with unique features, return a complete audio track to a platform with just a few lines of code and provide users with artificially created music that they are free to use without some of the customary restrictions associated. with recorded music.

In order to generate a full audio track, the music of the track is first arranged in digital musical notation format and then rendered into audio format. The arrangement can he an arrangement of newly-composed music segments, existing music segments (in an editing context) or a combination of both.

Figure 3 shows a block diagram of the Al music, production system which gives a high-level overview of some of its core functions that arc described in further detail later.

Herein the term artificial intelligence (Al) is used in a broad sense, and as such covers both machine learning (NIL) and also expert (rules-based) systems which are not ML systems, as 1.5 well as other forms of Al system that are neither ML nor expert systems. Although in the following, specific references are made to ML and expert systems, or combinations thereof, the description applies equally to other forms of Al system.

The system is shown to comprise a composition engine 2 and a production engine 3, which broadly represent two core aspects of the system's functionality. These are shown arranged as layers in a stack, with the composition engine below the production engine to reflect their respective functions. Different possible structures of the stack are described later, but these all broadly follow this division between composition and production.

The composition engine 2 composes segments of music in a digital musical notation format.

Herein a digital musical notation format means a digital representation of a musical score in computer-readable form. One such format is an event-based format, where musical notes are indicated by events with a start timelstop time. Such notations are known. This can be a format in which musical notes are represented. as a pitch value and associated timing data denoting the start and end time of the note (or viewed another way its start time and duration or "sustain").

The notes can he represented individually or as chords for example.

The pitch value is commonly quantised to musical half-tones, but this is not essential, and the level of quantisation can depend on the type of music. Often other musical data will also be embodied in the format, such as a velocity or pitch modulation of each note. The velocity parameter traces back to acoustic instruments and generally corresponds intuitively to how hard a musical instrument. such as a piano or guitar, should be played (thus controlling musical dynamics) The format is such that it can be interpreted by a synthesiser (such a*s a virtual instrument), which in effect "plays" the score to create audio, by interpreting the various parameters according to its internal musical synthesis logic, One example of such a format is MIDI, which is a standardised and widely used way of representing scores, but the term applies more generally to other formats, including bespoke formats. The following refers to MIDI segments by way of example but the description applies equally to any other musical notation format. The composition engine preferably operates based on machine learning (NIL) as described later.

Herein, the terms "music segment" and "musical segment" are synonymous and refer generally to any segment of music in digital musical notation format unless the format is otherwise specified. Each segment can for example be musical bar. fraction of a bar (e.g,. crotchet, quaver, semi-quaver length segments etc.) or a sequence of multiple bars depending on the context. A music segment can be a segment within a longer musical score. A musical score can be made up of multiple musical parts (corresponding to different performative voices e.g. vocal parts, instruments, left and right hand parts for a particular instrument etc.). In sheet music notation, each part is generally scored on a separate staff (although a chord part for example could be scored using chord symbols), and viewed front this perspective each music segment could correspond to a bar, a fraction of a bar or sequence of bars for one of the parts. This applies equally to MIDI segments, whereby a MIDI segment refers to a music segment in MIDI format. Whilst individual MIDI segments can be embodied in separate MIDI files or data streams, different MIDI segments can be embodied within the same MIDI file or data stream. It is also possible to embody MIDI segments for different musical parts within the same MIDI file or data stream, e.g. using different MIDI channels for different parts, as is known in the art. Accordingly, in the following description, MID) loops and individual segments of a MIDI loop or part may both he referred to as music segments. It will be clear in context what is being referred to.

A core function of the production engine 3 is taking a set of one or more MIDI segments and converting them to audio data that can be played back referred to herein as audio rendering. A self-contained representation of the track in audio format is referred to herein as an "audio render". This is a complex process in which typically multiple virtual instruments and audio effects (revert), delay, compression, distortion etc.) are carefully chosen to render different MIDI segments as individual audio data, which are "mixed" (combined) synergistically to form a final "track" having a desired overkill musical and sonic effect or "soundscape" where the track is essentially a musical recording. The role of the production engine is analogous to that of a human music producer and the production engine can be configured based on expert human knowledge. However, in use. the production process is an entirely automated process driven by a comparatively small number of selected production parameters. The production engine is also an Al component, and can he implemented either as an expert (rules-based), non-ML system, an ML system or a combination of rules-based and ML processing.

One key service provided by the system is the creation of piece of music, in the form of an audio track (e.g,. 'NAV, AIFF, inp3 etc.) "from scratch", which involves the composition creating MIDI segments that form the basis of the track that is produced by the production engine, by synthesising audio parts according to the MIDI segments that are then mixed in the mariner outline above. This is referred to herein as a "full stack" service.

However, a benefit of the system architecture is its ability to offer individual parts of the functionality of the production engine or the composition engine as services.

One such service is "production as a service", whereby a composer can provide to the system MIDI segments that he has composed, where in this context it is the Al system that assumes the role of producer, creating a finished audio track from those MIDI segments. This offers the functions of the production engine as a standalone service and is essentially the opposite of 23 MIDI as a service. Production as a service is particularly useful for composers who lack production skills or inclination.

Another important service in the present context provides the ability to edit/re-arrange existing tracks/compositions.

All of the services can be accessed via an access component 14 in the form of an application programming interface (API), such as a web API, whereby API requests and responses are transmitted and received between an external device and an API server of the system via a computer network such as the Internet. The access component 14 comprises a computer interface to receive internal and external requests as described later.

Regarding the division between composition and production, although each of these has certain core, defining characteristics, there is some flexibility on where the line is drawn in terms of the system architecture. Ultimately, the system is structured in line with the musical principles according to which it operates.

In simple terms, the traditional process of music creation can be considered in the following 10 stages: 1. Composition 2. Performance (or humanization) 3. Production Depending on the context, certain forms of composition can be broken up into two relatively distinct sub-stages: element composition and arrangement. Here, element composition refers to the creation of the essential musical elements that make up a track, which are then arranged to create a piece of music with convincing long term structure. These can both fall within the remit of a composer. or they can he quite separate stages, and historically this has been dependent to a certain extent on the style of music. However, in other contexts composition and arrangement can essentially be performed as one. The term "composition" as it is used herein can refer to composition that incorporates arrangement or element composition depending on the context. Performance would traditionally be the elements of variation introduced by a human performer (such as timinWvelocity variations etc.), and production the process of capturing that performance in a recording. Over time, however, the lines between these aspects have become more blurred, particularly with more modern electronic music that can he created with no more than minimal human performance, using MUM sequencing and the like, leading to a greater emphasis being placed on production than performance in some instances. Nowadays, the term production can cover a broad range of things, such as balancing the levels of individual channels, equalization, dynamic control (compression. limiting etc.) and other audio effects (revert, delay, distortion etc.), the selection of virtual instruments to venerate audio for individual channels etc. In terms of the implementation of the AI music production system. the composition, arrangement and performance functions can be implemented as essentially standalone functions of the production engine, which take MIDI segments from the composition engine, and arrange and humanise them respectively. For example, the MIDI segments could be short loops that are strictly time quantised to fractions (e.g. 1/16 or I/32) of a bar. These can then be arranged (e.g. according to a verse chorus type structure), and performance can be added by adding a degree of variation (temporal, velocity, pitch etc.) to approximate an imperfect human performance. With this approach, it can be convenient to implement these functions in the production engine, along with the MID! processing performed as part of the final music production process.

However, an equally viable approach would be. to amalgamate one or both of these functions with the composition itself, whereby the ML compositionengine is trained to compose music with convincing long term structure and possibly humanisation, within the composition engine.

Thus arrangement and performance can be implemented in the production engine, the composition engine or a combination of both, In a practical context the architecture of the system will to some extent reflect the approach that is taken to musical composition and arrangement.

I> is noted that humanisation in particular is an optional component, and may not be desirable for every type of music (e.g. certain styles of cieetronicai.

Composition Engine: A possible structure of the composition engine 2 is described below. First certain underlying principles that feed into the design of the composition engine 2 are discussed.

A Probabilistic Sequence Model (PSM) is a component which determines a probability distribution over sequences of values or items. This distribution can either be learned from a dataset of example sequences or fixed a priori, e.g. by a domain expert. By choosing an appropriate dataset or encoding suitable expert knowledge. a PSM can be made to reflect typical temporal structures in the domain of interest, for example, typical chord or note sequences in music.

A PSM can be used to generate sequences according to its distribution by sampling one item at a time from the implied probability distribution over possible next items given a prefix of items sampled so far. That is, each item is selected according to a probability distribution of possible items that is generated by the PSM based on one or more of the items that have been chosen already. In the context of the composition engine, the items are music segments, which may for example correspond to a fraction of a bar (e.g. 1/16, 1/32 etc.) at the level of the composition engine but which can be segments of any length depending on how the PSM is configured. Each music segment can for example correspond to an individual note or chord at a particular point. in the sequence.

The probability distribution provides a set of candidate music segments (notes, chords etc.) for selection for a sequence based on one or more music segments that have already been selected for the sequence --and an associated probability value for each candidate music segment, which defines how likely that music segment is to be selected as the next music segment in the sequence. Because the output is probabilistic, this introduces an element of variation whereby the same composition settings can give rise to different compositions (as described below, an additional probabilistic element can also be introduced in selecting the composition settings themselves).

Examples of PSMs include Markov chains, probabilistic grammars, and recurrent neural networks with a probabilistic final layer (SOETM A X etc.).

A Composition Engine (CE) is a system which is able to turn a small number of composition parameters into either a complete musical score or a shorter section of music, possibly with an arbitrary number of parts. A part is understood to he a division of musical material between performative voices, which can then he rendered in distinct ways. This distinction is fundamental in the practice of music production; for example, different musical instruments and spatial parameters can be assigned to each part in order to simulate a physical musical performance.

It may be possible to build a relatively basic composition engine that can provide multiple parts with a single PSM, such as a neural network. That is, by building a single PSM over a complete moment-by-moment description of all aspects of a multi-part composition. Such an approach is viable, however with more complex composition this may necessitate some internal compromises to simplify the model and make it workable. Whilst this may be sufficient in some contexts, other approaches may be beneficial when it comes to more complex and intricate composition.

Accordingly, depending on the level of complexity, it may he appropriate to divide the task between multiple PSMs, each of which has a specialised role, such as focusing on a particular combination of attributes, or a particular kind of part. In that case an important modelling decision is how specific each PSM's scope should he.

Bringing together a loosely coupled collection of PSMs in a modular approach has the potential for groat flexibility in how individual requests to the CE can be serviced.

Using the technology described below, it is possible to coordinate each PSM to work coherently with the others, without limiting the capabilities of arty individual PSM. That is, these principles provide a solution to the problem of sharing information between multiple PSMs in a flexible way. The main elements of this technology can he summarized as follows: 2.0 1. A modular extensible system for working with musical attributes such that they can form part of the input to or output from a NM.

2. Multiple PSMs responsible for modelling restricted combinations of attributes and/or parts.

3. A mechanism to condition the events sampled from a PSM on attributes produced by another or from an external constraint.

These will now be described in detail.

1. A modular extensible system for working with musical attributes such that they can form part of the input to or output from a RSA/ A musical event is a complex object that can be described in terms of a potentially unbounded number of aspects or attributes pertaining to the event, including intrinsic properties such as pitch, duration, vibrato etc., but also the event's relationships with its context, such the underlying harmony, its position in time, whether a note is higher or lower than the previous note, etc. Focusing on a limited number of these "viewpoints" allows a PSM to focus on capturing the probabilistic structure in certain aspects of musical sequences (in order to obtain a tractable model) whilst leaving others to be dealt with by some other system. Two PSMs can he coordinated by sharing one or more viewpoints; for example values for a viewpoint can be generated from one PSM and fed in as constraints on the sampling space from the other. This vastly reduces the complexity of the modelling problem. A modular approach to working with viewpoints means that PSMs can easily be created to model arbitrary combinations of viewpoints, whilst ensuring consistent coordination between the PSMs, both during training and generation.

ID

2. Having multiple PSMs responsible for modelling restricted combinations of attributes and/or parts.

A "divide and conquer' approach to solving the complex composition problem is to provide IS specialised PSMs for particular musical attributes (in particular styles). E.g.. one PSM may specialise in producing chord symbols with durations, and another might specialise in chord symbols and melody note pitches and durations. This means that each PSM can focus on modelling its combination of attributes accurately., leading to high-quality, musically convincing output. The loose coupling of PSMs means that they can be used freely in 2.0 combinations chosen at the point of servicing a composition request, allowing the system to he flexible in the choice of numbers and kinds of parts that can be generated Ibr one composition.

3. Ability to condition the events sampled from a PSM on attributes produced by another.

Certain PSMs can be used in a way which allow the outputs of one to be the (perhaps partial) inputs of another. For example, A PSM over melody notes with chord symbols could he conditioned to match the chord synthol produced by a different. PSM. This promotes coherence between parts, and allows the composition engine 2 to take advantage of the modularity of the multiple PSM approach without sacrificing musical quality.

Figure 4 shows further details of one possible configuration of the composition engine 2 according to the principles set out above. In this case, the task is divided between multiple neural networks but these could he other forms of PSM as indicated.

The composition engine 2 is shown having an input 402 and an output 404, which are an internal input and output respectively. The composition engine input 402 is configured to receive requests for MIDI segments, each having a job identifier OD) assigned as described below.

A key Function of the composition engine is generating musically cooperating music segments for different musical parts, which are structured to be performed simultaneously to create a coherent piece of music. The MIDI segments can be midi "loops" which can be looped (repeated) in order to build up a more complex track. If different MIDI loops are provided for different musical parts, these can be looped simultaneously to achieve the effect of the parts playing together. Alternatively, multiple parts can be captured in a single MIDI loop. However, the principles can be extended such that the composition engine 2 provides longer sections of music, and even a complete section of music for each part that spans the duration of the track.

Music segmentts) for multiple musical parts can be requested in a single job request. Where different passages o.f music are requested separately (e.g. verse and chorus), these can be requested by separate job requests, though the possibility of requesting such passages of music. in a single job request te.g. requesting verse and chorus together) is also viable. These job request(s) correspond to the job requests of Figure 2 (described below), but are labelled 406a, 406b in Figure 4. Note that these job requests could be received directly from an external input of the access component (see Figure 1, below), or be received as an internal job request as explained with reference to Figure 2. Each job request comprises the job ID and a set of musical composition parameters, which in this example are:

Field Name

l Type Description

nemeasures Number style String Number Length in measures (bars) of the MID: loop to generate either 1, 2, 4, or 8 Musical style: One of a predetermined set of possible styles re.8 piano, folk, rock, cinematic, pop, chiliout, corporate, drum earid _bass, ambient, synth_popi Musical tonic (key): -11], with 0 = C tonic tonality String One of [naturalmajor, natural.-.minor) As noted, not all of these composition parameters are essential, and other different types of composition parameter can be defined in different implementations. A key aspect of the system is that a user is able to define the style they want (alternatively the system can select the style autonomously where it is not sped tied see below), and the composition engine 2 can provide compositions in different styles according to the architecture described later.

The composition layer 2 is shown to comprise a plurality of composition modules, labelled 408A and 408B. Each composition module is in the form of a trained neural network, each of which has been trained on quite specific types of musical training data such that it can generate music in a particular style. In the following examples the composition modules are referred to as networks, but the description applies equally to ether forms of ML or PS1Y1 composition module.

The composition parameters in each job request 406a, 406h are used both to select an appropriate one of the networks 408A, 408B and also as inputs to the selected network. In this example, each of the predetermined styles is associated with a respective plurality cif networks. By way of example, Figure 4 shows the first networks 408A associated with a first style (Style A) and the second networks 408B associated with a second style (Style 13).

Within each style group 408A, 40813, suitable networks can be selected for the task at. hand. As will be appreciated, the manner in which networks are selected will depend on how those networks have been optimised in accordance with the principles set out above.

For each job request 406a. 406h, a composition controller 408 of the cot position engine 2 selects an appropriate subset of the networks to service that job request. The network subset is selected on the basis that is associated with the musical style specified in the job request.

As noted, multiple parts -such as chords and melody -can he requested in the same job request. This applies both to internal and external requests to the composition engine 2.

Once generated, the MIDI segment(s) generated in response to each job request. 506a, 506h are stored in a job database (24, Figure I) in association with the assigned job ID. Alternatively, MIDI segments could he stored in a separate database and all description pertaining to the job database in this context applies equally to the separate database in that event.

With reference to Figure 5, networks associated with a particular style cooperate to produce a plurality of musically cooperating elements. This is achieved by providing outputs of the networks as input to other networks in a hierarchical relationship.

To illustrate this underlying principle, Figure 5 shows three networks associated with Style A: chord (CN), melody (MN) and harmony (HN), which correspond to the first networks 408A in Figure 4.

In this example, each of the networks CN, MN and FIN is shown configured to receive as inputs composition parameters 502 determined by the composition controller 408 of the composition engine 2 in the manner described above. Although shown as the same input, the network need not receive exactly the same parameters, and each can receive different selections of the composition parameters for example.

The chords network CN is configured to generate a chord sequence (progression) 504 based on the parameters 502. This need not be MIDI, and could for example he a symbolic chord representation, but it may be convenient (though not essential) to convert it to MIDI for subsequent processing. The generated chord sequence is stored in the job database in association with the applicable job ID.

In addition, the melody network MN receives, as input, the generated chord sequence 504 and generates a melody 506 based on the chord sequence 504 and the composition settings 502. to accompany the chord sequence in a musical fashion. That is, the melody 506 is built around the chord progression 504 in the musical sense. The generated melody 506 is also stored in the job database 24 in association with the applicable job ID.

In addition, the melody 506 is inputted to the harmony network HN. The harmony network FIN generates, based on the composition settings 502 and the melody 506, a harmony 508 which it outputs as a MIDI segment, which is a harmonization of the melody 506 in the musical sense. Although not shown in Figure 5, it may be appropriate for the harmonization network HN to also receive the chord sequence 504 as input, so that is can harmonize the melody 506 and also fit the harmony 508 to the chord sequence 504. The generated harmony 508 is also stored in the job database 24 in association with the applicable job ID.

The chord sequence 504, melody 506 and harmony 508 can be requested in the same job request, and in that event are stored together in the job database 24 in association with the same job ID.

The output of each network can be, but need not he MIDI -it could be some other digital musical notation format, such as a bespoke format (see above). It may be convenient, where the output is not MIDI, to convert it to MIDI later, but this is not essential.

Networks can also take, as input, external MIDI, such as auser-generated or library MIDI segment and compose around this.

Another example of input that a network can compose to is percussion, which can be user or ML generated. Here, the percussion can for example drive the rhythm of the composed segments, or the emphasis that is placed on certain notes (where emphasis/velocity is handled at the composition engine 2).

Full Stack: Figure 1 is a schematic block diagram illustrating one possible configuration of a music production pipeline I of the music production system. The music production system is organised into four layers or components. It will be evident from the following that there may he some overlap between functionality of the individual layers or components, but the following description illustrates clearly how the generation of a piece of music is organised in the music production system. The music production system operates to receive a group of settings, which will be described in more detail later, and generates a piece of music. In the following, a piece of music is referred to as a 'track', but it will be understood that the system can produce music of any length / character. The track may be generated as a musical score in a digital musical score notation, such as MIDI, or in audio, Where score formats other than MIDI are used it may be convenient (but not essential) to convert it to MIDI for later processing. For this reason a conversion layer (not shown) may he provided within the system which converts a notation score into MIDI. It will be appreciated that this conversion layer could form part of the composition engine itself or could form part of another layer in the system that could receive a score and convert to MIDI for the purpose of using MIDI.

A production management component (controller) 13 manages the layers of the system in the manner described below. The controller 13 handles both internal and external requests, and instigates functions at one or more of the layers as needed in order to service each request.

Reference numeral 2 denotes the composition engine. The composition engine operates to receive a group of settings, which will be described in more detail later, and generates MIDI segments to be arranged and produced into a track. It generates segments of music in a symbolic format, to be arranged and produced into a track. It uses a collection of PSMs to generate the segments of music. These PSMs have been trained on datasets of music tracks chosen to exemplify a particular musical style. The composition engine determines which PSMs to employ on the basis of the input settings.

Reference numeral 4 denotes an arrangement layer. The arrangement layer has the job of arranging the MIDI segments, produced by the composition engine 2 into a musical arrangement. The arrangement layer can be considered to operate in two phases. In a first phase, it receives arrangement parameters which will be described later and produces from those 2.0 parameters a musical arrangement. as an envelope defining timing and required sequences etc. The arrangement functionality of the arrangement layer is marked 6. This envelope defines the musical arrangement of a piece. As will be described in more detail later, these settings can he used to request MIDI segments from the composition engine 2, through the production manager. A second phase of the arrangement layer is the sequencing function 8. According to 1.5 the sequencing function, MIDI segments are sequenced according to the arrangement envelope into a finished piece of music. The MIDI segment may be provided by the composition engine (as mentioned earlier), or may be accessed from a pre--existing library of suitable MIDI segments, which can be generated in advance by the composition engine 2. The production management component 13 may for example cheek the library to see if suitable pre-existing MIDI is available, and if not instigate -a request to the composition engine 2 to generate suitable MIDI. Alternatively, the library check can be performed at the composition engine 2 in response to a request, or alternatively the library check can be omitted altogether. Further, MIDI segments may be introduced by an external user as will be described in more detail later. The arrangement layer 4 provides an arranged piece of music in MIDI form. in some situations, this 'raw' piece of music might be suitable for some purposes. However, in those circumstances, it will not be playable in any useful form. Therefore, a performance layer 10 is provided which adds performance quality structure to the piece of music produced by the arrangement layer 4.

There is a decision tree in the arrangement section which operates based on incoming settings. This decision tree embodies human expertise, namely that of a human music producer. The arrangement layer generates a musical arrangement structure using the settings, which has a set of time sequenced sections for which it then requests MIDI from the composition engine (or elsewhere, e.g. from a library), and which in turn are sequenced according to the arrangement structure.

It is noted again that this is this is just one example of how long-form structure can be created for a piece of music. As an alternative to this separate arrangement layer, that operates 'agnostically' of the MIDI to be sequenced, arrangement could he handled as part of the composition itself, in the composition engine 2.

The performance layer outputs a performance quality piece of music in MIDI. There are many applications where this is useful. However, similarly, there are other applications where an audio version of the piece of music is required. For this, an audio rendering layer 12 (audio 0 engine) is provided which outputs a performance quality piece of music rendered in audio.

The conversion or rendering of a piece of music MIDI to audio can be done in a number of different ways, and will not be described further as these include ways that are known in the art.

As noted, the music production engine has an access component 14 which can be implemented in the form of an API (application programming interface). This access component enables communication within the music production system (in particular, the production management component 13 can communicate with the composition engine 2 via the access component 14 - 33 sec below), and also enables functionality to be provided to external users. For the sake of illustration, the side of the access component 14 facing the music production system will be considered to be responsible for internal routing between the layers via the production management component, whereas the side facing away will be responsible for inputs and outputs from an external user. It will be appreciated that this is entirely diagrammatic and that the API could be implemented in any suitable way. As is known in the art, an API is implemented using a piece of software executing on a processor within the API to implement the functions of the API.

The API has at least one external input 16 for receiving job requests from an external user and at least one external output 18 for returning completed jobs to an external user. In addition, in some embodiments, the API enables communication between the internal layers of the music production system as will be described, lobs which can he requested at the input 16 include the following.

A request for tags can be input by a user which retrieves a list of tags which are usable in providing settings to create a musical track. Tags include musical styles such as piano, folk et cetera. A full list of tags is given below by way of example only. Tags are held in a tags store 20. Such a request can also he used to request settings that are useable within the system if desired.

Different types of tag can be defined, such as mood and genre tags. Examples of genre tags include: Piano. Folk, Rock, Ambient, Cinematic, Pop, Chillout, Corporate, Drum and Bass, Synth Pop. Example of mood tags include: Uplifting, Melancholic, Dark, Angry, Sparse, Meditative, Action, Emotive, Easy listening, Tech, Aggressive, Tropical, Atmospheric.

It may he that the system is configured such that only certain combinations of genre and mood tags are permitted, but this is a design choice. Note that this is not an exhaustive list of tags any suitable set of tags can be used as will become apparent in due course when the role of the tags in selecting composition and production settings within the system is described.

A library query can be provided at the input 16, the library query generates a search to a paginated list of audio library tracks which are held in a tracks store 22, or alternatively in the jobs database 24. These can he stored in an editable format which is described later. These are tracks which have been already created by the music production system or uploaded to the library from some other place. They are stored in a fashion which renders them suitable for later editing, as will he described in the track production process.

The library query for tracks returns the following parameters: * Job ID -this is a unique identity of a track winch has been identified, and in particular is the unique ID allowing the track to he edited * Tags -these are tags associated with the track identifying the style * Assets -this denotes the type of asset, i.e. MIDI or WAF * Duration this denotes the length of the piece of music. In song creation, the length of a piece of music is generally around 3 minutes. However, pieces of music may be generated for a number of purposes and may have any suitable duration.

As will he appreciated. these are just examples, and the request can return different parameters in different implementations.

The input 16 can also take requests to create iobs. The jobs can be of different types.

A first type of job is to create an audio track. To achieve this job, the user may supply a number of audio track create settings which include: * Musical style * Duration -the length of the track.

* One or more tag -defining the style of the track * Tempo -the musical tempo of the track * Sync points -any particular place where there is to be a concentration of intensity in the track or other events, such as specific instrument entries at specific points or any other events that lend musical character to the truck.

* Intensity curve -generalization of sync points that allows desired intensity variations in the track to he defined with greater flexibility as a curve over time.

Note that not all of these parameters are required. The system is capable of making some autonomous decisions based on minimal information. For example, the system is capable of creating an audio track if it is just supplied with the duration. The production management component 13 itself will determine tags, tempo arid sync points in that event. In fact, the system is capable of generating a track with no input settings -any of the settings can be selected autonomously by the system if they are not provided in the track request.

The production management component can also generate settings for one or more than one of the layers based on the musical style. When generating a complete track this involves generating, based on the style, both audio production parameters for the audio production engine 3 and composition parameters for the composition engine 2, as described in more detail below.

In the following, certain parameters may be referred to as required. As will he appreciated, this simply refers to one possible implementation in which these parameters are made required parameters as a design choice. There is however no fundamental requirement for any of the parameters to he provided by a user, as it is always possible to configure the system to autonomously select any desired parameter that is not provided by a user.

A request for an audio track involves use of all of the components of the music production system, including the audio rendering layer to produce a track rendered in audio. In this example, a request to create a MIDI track uses the composition engine, the arrangement layer and performance layer to produce a track in MIDI. It does not use the audio rendering layer. As noted, the arrangement layer and performance layer are optional components and the system can be implemented without these. For example, the composition engine 2 can he configured to generate fully-arranged MIDI with humanization where desired.

Track production is described later.

A second type of request is lo edit an existing audio track. Tracks are stored in a track library identified by unique job identifiers, in the manner described below. A user must supply the ID of the job to edit. Note that this could be achieved by carrying out the library query mentioned earlier in order to identify the correct job ID for the track that is needed to be edited. 'The user can provide a new duration for the track. Optionally, the tempo and sync points can be defined. The output of this is a new version of the existing track, edited as defined by the new settings.

Alternatively, the existing duration can be used if the user does not which to change the duration and wishes to edit some other aspect(s) of the track (or the system could even he configured to select a duration autonomously if none is provided but a change of duration is nonetheless desired). The system is able to handle edit requests because sufficient information 1-1 about the decisions made by the system at every stage is stored in the job database 24 against the track ID as described below.

The system may also he equipped to handle requests to edit a MIDI track as described later. These can be handled in much the same way as audio track edit requests, but the resulting output is MIDI rather than audio.

A track creation task will now he described with reference to Figure 2. In Figure 2, numbers in circles represent steps of a method, and are distinct from reference numerals denoting particular elements of the structure. Elements of the structure shown in Figure 2 correspond to those discussed in Figure I and are marked with reference numeral corresponding to that in Figure I. A human user can provide a job request 30 in step I at the input 16 of the API 14. The job request 30 can in principle be any of the job types which have been described above, but the present part of the description relates to creation of an audio track or MIDI track. The job request 30 defines at least one parameter for defining the creation of those tracks, as described above. Alternatively, as noted, the job request 30 may define no parameters, and all parameters may in that even: he selected autonomously by the system. At step 2, within the API 14, a job identifier is assigned to the job request 30. This is referred to herein as ID A. The job is then assigned to the production job queue 32 which is associated with the production manager 13. The allocation of the job 1D A to the production queue is denoted by step 3.

At step 4, the production manager operates to produce a track. The production manager [3 has access to the arrangement layer 4, the performance layer 10 and the audio rendering layer 12.

Note that in Figure 2 the performance layer is not shown separately but is considered to be available to the production manager as needed. The production manager 13 operates in association with the arrangement layer 4 according to an artificial intelligence model embodied in the production layer. This can be embodied by a decision tree which incorporates human expertise and knowledge to guide the production layer through production of a track, howeYer other implementations are possible. For example, as noted already, the production engine can be implemented using MU This decision tree causes the production manager 13 to access the arrangement layer 4 as indicated at step 5. The arrangement layer 4 operates to provide a musical arrangement which consists of at least timing and desired time signature (number of heats in a bar) and returns an arrangement envelope to the production manager 13 as shown in step 5a. The production manager 13 is then activated to request MIDI segments which will be sequenced into the arrangement provided by the arrangement layer 4. As indicated above, this is just one possible implementation that is described by way of example. In particular, as noted S above, the system can be implemented without one or both of the arrangement layer 4 and performance layer 8, with the functions of these layers when desired handled elsewhere in the system, e.g. incorporated into the operation of the composition engine 2. This request can also be applied through an API input, referred to herein as the internal API input 17. For example, the production manager 13 can generate a plurality of MIDI job requests; for example these are shown in Figure 2 labelled B1, B2, 133 respectively. Each of the MIDI job requests are applied to the internal input 17 of the API 14. The API 14 assigns job identifiers to the MIDI job requests, indicated as II) BI, ID 82 and ID 83 and these jobs labelled with the unique identifiers are supplied to the MIDI jobs queue 34 in step 8. The identifiers are returned to the production manager 13. This is shown by step 7.

The jobs with their unique identifiers are assigned to the composition engine 2 which can generate using artificial intelligenceimachinc learning individual MIDI segments. The composition engine has been trained as described above.

The composition engine 2 outputs MIDI segments as indicated at step 9 into the job database 24. 'the MIDI segments could be stored in a separate database or could be stored in the same job database as other completed jobs to be described. Each MIDI segment is stored in association with its unique identifier so that it can be recalled. The production manager 13 periodically polls the API 14 to see whether or not the jobs identified by ID 81, ID 82 and ID L., 133 have been completed as described in the next paragraph. This is shown at step 10. When they are ready for access, they are returned to the production manager 13 who can supply them to the arrangement layer for sequencing as described above. The sequenced segments are returned via the production manager 13 either to an output (when a MIDI track is desired), or to the audio rendering layer 12 (step 12) when an audio track is required.

Assigning job Ds in this way has various benefits. Because the job ID is assigned to a request when that request is received, a response to that request comprising the job LD can be returned immediately by the API 14 to the source of the request. before the request has actually been actioned (which depending on the nature of the request could take several seconds or more particularly in the case of audio). For example, a request for audio or MIDI can be returned before the audio or MIDI has actually been generated or retrieved. The source of the request can then use the returned job ID to query the system (repeatedly if necessary) as to whether the requested data (e.g. audio or MIDI) is ready, and when ready the system can return the requested data in response. This avoids the need to keep connections open whilst the request is processed which has benefits in terms of reliability and security.

Audio Engine: There now follows a description of how audio is rendered in the music production system described herein. Reference is made to Figures 1, 2 and 6. Figure 6 shows a flow chart for the process that eventually results in a full audio render of a desired track. A request for an audio track is one of the job types mentioned above which can be received at the input 16 of the API 14. In this context, the API provides a computer interface for receiving a request for an audio track. in this connection, an audio track is an audio rendered piece of music of any appropriate length. It is assumed that it is a completed piece of music in the sense that it can be rendered in audio data and listened to as a complete musical composition. The incoming request is assigned a Job ID. As mentioned above, the request can include one or more parameter for creating an audio track. Now that, as also mentioned before, it is possible to request a track without supplying any track creation parameters. in which ease the system can use a default track creation process, involving for example default parameters. Such default parameters would be produced at the production management component 13 responsive to the request at the input 16. For example, a default duration may be preconfigured at 90s. Other default lengths are possible. Based on the request, multiple musical parts are determined. These may be determined at the production management component 13 based on input parameters in the request supplied at the input 16, or from parameters generated by the production management component. Alternatively, the musical parts may be provided in the request itself by the user making the request. this case, musical parts may be extracted from the request by the production management component 13. This provides the music production system with extensive flexibility. That is, it can either work with no input from a user, or with many constraints supplied by a user, including track creation parameters and/or musical parts. The determination of musical parts is shown in step 5602. Audio production settings are also generated from the request. This is shown in step 5603. Note that step 5602 and S603 could be carried out in sequence or in parallel. They may be carried out by the production management component, or any suitable component within the music production system.

The audio production settings and musical parts are supplied to the audio rendering component.

at step 5604. In addition, a sequence of musical segments in digital musical notation format is supplied to the audio rendering component. 'This sequence is generated by the composition engine or obtained elsewhere and is in the form of MIDI segments. These MIDI segments can be generated as described earlier in the present description, although they do not need to be generated in this way. Furthermore, it will be appreciated that an arranged sequence of MIDI segments could be supplied to the audio rendering component 12. This arranged sequence could he derived from the arrangement component 4 as described earlier, or could he an arranged sequence generated by a combined composition and arrangement engine. Alternatively, an arranged MIDI sequence could he provided by the user who made the audio track request.

The audio rendering component 12 uses the audio production settings, the musical nags and the MIDI sequence to render audio data of an audio track at step 5605. At step 5606, the audio track is returned to the user who made the request through the output port 1K of the API component.

A more detailed description will now he given a step 603 in which the audio production settings are chosen. The production management component 13 uses one or more tags to access a database of settings labelled 23 in Figure 1. The lag or tags may be defined in the request which is input at the input 16, or may be generated by the production management component from information in the input request, or generated autonomously at the production management component.

For example, if a style parameter is defined in the request, tags appropriate to that style parameter can be requested from the tags database 20. Alternatively, one or more tag may he selected at random by the production component 13. The structure of the database of settings 23 is shown in Figure 7. The database 23 is queryable using tags, because each arrangement settings database object is associated with one or more of the tags. There is no limit to the number of tags which may be associated with a single arrangement settings object. The database of arrangement settings objects can be queried by providing one or multiple tags and 3I returning all arrangement settings objects which arc marked with all of the provided tags. An arrangement settings object 01 is shown in the database 23 associated with tags T1 and T2, but the object 01 can be associated with any number of tags. Each arrangement settings object comprises three groups of settings. There is a group of arrangements setting 70, a group of composition settings 72 and a group of audio settings 74. This is just an example and there can be more or fewer groups of settings. As will be appreciated, the grouping of the settings reflects the architecture of the system, which can he designed flexibly as noted. For example, arrangement settings 70 may be incorporated in the composition settings 72 where arrangement is handled as part of composition.

The groups have been defined to co-operate in a finished musical piece in accordance with the style indicated by the tag(s). As described almady, tags can define such things as genre/mood/instruments. The settings recalled by the production management components 13 from the settings database 2.3 are used to control production of the music. A particular collection of settings can be selected from each group for each musical part, or one or more of the settings may apply to multiple musical parts. Reference is made to Figure 8 to show the selection flow for audio production. An instrument is selected for each part from the group of audio settings for the particular tag or tags. This is denoted by crosshatching in Figure 8. One way of selecting the instrument for each part is to select it randomly from the group of settings appropriate to that part. Within the audio settings there may be a category of settings associated with each part, for example bass, melody. harmony et cetera.

A particular sound for the instrument is chosen by selecting a setting from a group of sound settings. This selection may be at random. One or more audio effects may be selected for each sound. Once again. this may be selected at random from a group of audio effects appropriate to the particular sound. In order to implement these selections, the production management component 13 uses a decision tree in which knowledge about the suitability of particular instruments for particular parts, particular sounds, for particular instruments and particular audio effects has been embedded.

The term "sound" in this context means a virtual instrument preset. Virtual instrument is a term of art and means a software synthesiser, and a virtual instrument preset refers to a particular virtual instrument preferably together with a set of one or more settings for configuring that virtual instrument. The virtual instrument preset defines a particular virtual instrument and the timbre or sonic qualities of the virtual instrument. Different virtual instrument presets can relate to the same or different virtual instruments. E.g. for a virtual instrument which emulates a piano, there might he a preset which makes the virtual instrument sound like a grand piano, and another which makes it sound like an upright piano. It is these presets that the system selects between when choosing the sound for an instrument. It can be convenient to bundle the settings that make up a virtual instrument present into a single file.

The composition settings associated with the tag can be supplied to the composition engine 2 for controlling the output of MIDI segments to incorporare into the track. The arrangements settings 70 associated with the tag can he applied to the arrangement layer 4 for use in deterinining how the MIDI segments from the composition engine should be arranged as governed by the tag.

Finished tracks are stored in the job database 2.4 in connection with the job ID that was assigned to the incoming request.

The track may he stored in terms of the settings (track settings 80) which were selected to generate it, along with the sequenced MIDI and/or the tut-sequenced MIDI loop(s) or other segment(s) output from the composition engine 2, instead of as the audio data itself. Then, this sequenced MIDI can be supplied to the audio rendering component 12 with the musical parrs and the selected audio production settings as in step 5604 of the flow of Figure 6) to regenerate the track. The track settings 80 are made up of not only the selected audio settings, but also the composition settings arid arrangement settings. That is to say, the track settings 80 contain all of the choices made by the production management component 13 and thus all of the settings needed to completely reproduce a track. In order to reproduce an identical track, these stored track settings 80 can be used at step S604 in Figure 6 to create a duplicate track. In this context, the track settings 80 are referred to as reproducibility settings.

Returning to Figure 2, in the context of a request for a track, the assigned job ID (ID A) constitutes an identifier of the track. The track settings 80 are stored in the job database 24 in association with the track identifier ID A. In addition, the identifiers ID BI, ID B2. and ID B3 are stored in the job database 24 in association with the track identifier IDA such that the pieces of MIDI used to build the track can he retrieved using the track identifier Et A. These can be sequenced or un--sequenced MIDI segments, or a combination of both. The information stored in the job database 24 in association with ID A is sufficiently comprehensive that the track can be reproduced using that information at a later time.

An example process for editing an existing track will now be described with reference to Figure 10, which shows an edit request 52 being received at the API 14 in step S 1102. The edit request 52 is shown to comprise a job ID 54 of a track to he edited and at least one new setting 56 according to which the track should be edited. An edit request is in effect a request to create a brand new track. but doing so using at least one of the settings and/or MIDI segments that were used to generate an earlier track. The track being edited can he an audio track or a MIDI track.

At step of S I:04, a response 59 to the edit request 52 is returned to a source of the request 52.

The response 59 comprises a job ID 58 which is a job IT) assigned to the edit request 52 itself Note that this job II) 58 of the edit request 52 itself is different to the job ID 54 of the track to be edited, which was assigned to an earlier request that caused that track to he created (this earlier request could have been a request to create the track from scratch or could itself have been a request to edit an existing track). At step 31106 the edit request 52 is provided to the production management component 13 in the manner described above. Using the job ID 54 of the track to be edited, the production manager 13 queries (31108) the job database 24 using the job II) 54 in order to retrieve the track settings 80 associated with the job ID 54, which it receives at step SI 110. Where the track settings 80 comprise one or more references to MIDI segments used to create the track these can also be retrieved by the production manager 1 3 if needed. As noted, such references can be in the form of job Ds where the MIDI segments are stored in the jobs database 24 or they can be references to a separate database in which the MIDI segments are held. From this point, the method proceeds in the same way as described with reference to Figure 6 but for the fact that the track settings used to create the edited version of the track are a combinaiion of one or mom of the track settings 80 retrieved from the job database 24 and the one or more new settings 56 provided in the edit request 52.

One example of a new setting 56 is a track duration, which a user can provide if he wants to create a longer or shorter version of an existing track. En a simple case, all of the original track settings 80 can he used to create the edited version of the track, along with the original MEDI segments, but with the original duration substituted for the new duration. Alternatively, new MIDI segments could be composed that are more suitable for the new duration, which involves an internal request to the composition engine 2. This is just a simple example and more complex track editing is envisaged. Note that. although in the example of Figure II, the one more new settings 56 are provided in the edit request 52, in a more complex scenario the production manager 13 may in fact select such new setting(s) 56 itself in response to the edit request 52, for example by selecting additional settings based on a setting indicated in the edit request 52 or by selecting new setting(s) autonomously by some other means.

S

As shown at step S I 112 in figure 11, the job ID 58 assigned to the edit request 52 is stored in the job database 24 in the same way as for other requests along with the track settings for the edited track which are labelled 80'. The track settings 80' are the settings that have been used to generate the edited version of the track and as noted these are made up of a combination of one or more of the original track settings 80 with the new setting(s) 56 determined in response to the edit request 52 in the manner described above.

The audio rendering functions referred to above are used in the present context to generate what is referred to herein as a "full" (desired) audio render of an arranged track. This may be a custom arrangement of an existing track which a user has caused to he created by providing arrangement parameters such as duration, sync points/intensity curve etc. In order to generate the full render, MIDI is arranged, sequenced and automated in accordance with the arrangements parameters in the manner described above to provide a complete track in digital music notation format, This is then rendered into audio format by the audio rendering component 12 as described above, thereby generating the full audio render.

This is a process than can typically take anything up to a few minutes during which time the user has no feedback as to the effect of the arrangement parameters he has provided on the newly-edited track. 'This is a consequence of the rich and detailed music production functionality that is provided within the music production system.

In order to effectively mitigate this situation, preview rendering functionality is provided that allows a "preview" render of the custom arrangement to be created very quickly;typically within the space of a few seconds or less, but in any event in less time than it takes to generate the full audio render) so that the user may get some sense of the musical structure of custom arrangement he has caused to he generated before the full audio render is available. The preview render is only an approximation of the full audio render and will generally be of lower musical quality. However, using the preview rendering method that is disclosed herein, an acceptable trade-off between musical quality and preview rendering speed may he attained.

The preview renderine functionality is provided at least in the contexts of audio track creation and audio track editing as described above.

Figure 11 shows a functional block diagram of the above music production system into which a preview controller 130 has been incorporated. Note that Figure 11 does not show every detail of the music production system but only shows the components of it that are considered relevant in the present context. hi particular, the description of the flow of messages and data within the music production system is not repeated for the sake of conciseness. However it will be appreciated that the steps and functions described in relation to Figure I I may be carried out in accordance with the messaging and data flows described above in relation to the earlier Figures.

Figure 11 shows the production manager 13 receiving a set of arrangement parameters 100 of the kind described above, such as track length (duration), sync point/intensity curve. etc. These may for example be received in a track edit request comprising an identifier of an existing track to be edited. Reference numeral 102 denotes the track ID of the existing track in Figure 1 I. The production manager 13 processes the arrangement parameters 100 and track ID 102 in the manner described above so as to cause the arrangement layer 4 to generate a custom arrangement of the-digital track in digital musical notation format. This is shown towards the bottom right of Figure 11 and is denoted by reference numeral 110. In the present example, the custom arrangement 110 comprises MIDI tracks containing MIDI sequences which are generated by sequencing MIDI segments 104 stored in association with the track ID 102.

Additionally, the custom arrangement.110 comprises rendering settings 112 for rendering the MIDI tracks such as automation settings 112a, virtual instrument (synthesiser) settings (e.g. virtual instrument preset(s)) 112b and audio effects (FX) settings 112e over the duration of the track. In the present example, these are determined by modifying track settings 106 stored in association with the existing track ID 102 in accordance with the arrangement parameters 100 provided by the user. The audio rendering component 12 processes the custom arrangement to produce a lull audio render thereof denoted by reference numeral 11.4.

As described above, there are various intermediate steps which are performed in order to go from the user-defined arrangement parameters 100 to the full custom arrangement 1.10 within the arrangement layer 4. In parallel. preview rendering steps are carried out as described below, to provide a quick but nonetheless informative preview audio render.

The first of these intermediate steps is the generation of an amnige.mt.nt envelope which is denoted in Figure 11 by reference numeral 118.

The arrangement envelope 108 is generated by the arrangement function 6 of the arrangement layer 4 from a set of available "section templates" 116 which are selected in dependence on the user-defined arrangement parameters 100. The arrangement envelope 108 defines a sequence of musical sections for the custom arrangement 110 each of which is assigned one of the section templates 116 by the arrangement function 6. Figure 11 shows three sections 118a, 118b and 118c of the arrangement envelope 108 however it will be appreciated that this is purely art example and an arrangement envelope can have any number of musical sections. Musical sections could for example corresponded to a verse, chorus, middle-eight etc. of a track to he structured in that manner. There may he multiple section templates for different section types, e.g. loud and quiet versions of a chorus. This allows a significant degree of musical variation within a track, by varying its constituent musical sections.

As well as assigning a section template selected to each section denoted by reference numerals 116a, 116b and Ilbe for sections 118a, 118b and 118c respectively -the arrangement envelope 108 also defines a duration 120a, 120b and 120c for each section 118a, 1186 and 1 18c. A typical section may have a duration that is a whole number of musical measures (e.g. 2 bars. 4 bars, 8 bars, 16 bars etc.). However, this is not an absolute rule and sections of unrestricted duration may be incorporated in order to accommodate a user-defined duration for the whole arrangement. For example, such sections could be included at the start and/or end of the arrangement envelope 108 or at a suitable pause point(s) within it.

In addition to the sequence of musical sections 118a, 118b, 118c, Figure 11 shows a time-varying intensity curve 122 defined over the cumulative duration of the sequence of sections.

This can be used to provide musical variation across the various sections and within each section individually, by defining changes in musical intensity over time. In the present example, this is the mechanism by which more granular musical variation is incorporated into the custom arrangement 110. The intensity curve 122 is preferably set in accordance with the arrangement parameters 1(X). For example. it may be defined by one or more sync points of the arrangement parameters 100 or the user may he offered more fine-grained control over the shape and structure of the intensity curve 122 depending on the implementation and/or the user's preferences. By way of example. the intensity curve 122 is shown gradually rising to an intensity peak 123 throughout the first section 1 18a and the majority of the second section 118b before dropping off again somewhat more rapidly for the remainder of section 118b and the third and final section 118c. The intensity peak 123 could for example be defined by a sync point of the arrangement parameters 100.

The MIDI sequencing component 8 of the arrangement therefore receives the arrangement envelope IOS and associated intensity curve 122 and sequences the MIDI music segments 104 associated with the track ID 102 in order to populate the defined musical sections 1 18a, 1 18b and 118c with composed MIDI in accordance with the respective section templates 1 16a, 116b, 116c associated therewith. The result is corresponding musical sections 124a, 124b, 124e in the custom arrangement 110 having the defined durations 120a, 1201; and 120c and populated with sequenced MIDI spanning within those respective durations which has been selected from the existing music segments 104 in accordance with the applicable section templates 116a, 1 16b and 116c.

As indicated, the intensity curve 122 is used to introduce musical variation across the corresponding sections 124a, 124b and 124c of the custom arrangement 110. As a simple example, the intensity curve 122 can be used to set and vary note "velocity" across the duration of the custom arrangement by modulating a velocity of each note based on the value of the intensity curve 122 at a time of that note. Note velocity is a known concept in electronic music production and sets the dynamics according to which a note is rendered by a virtual instrument (analogous to how hard or soft a traditional musical instrument is played). 'This provides a simple but effective way of varying the musical dynamics across the custom arrangement 110 in accordance with the user-defined arrangement parameters 100. However, the intensity curve 122 can he used to introduce any desired musical variation within the custom arrangement by modifying one or more of the automation settings 112a, settings of the virtual instruments 112b and setting associated with the audio effects 112c (such as FIX settings, send channels/audio routings etc.). This can he achieved via a mapping function which maps the intensity curve 122 at di fferent points in time to whichever settings are desired and in any desired manner. The mapping function can be section-specific as described below.

The preview controller 130 of Figure 1 1 controls the generation of a preview render of the custom arrangement 110 for outputting to the user before the full audio render 114 is available. The preview controller 130 is shown receiving the arrangement envelope 108 and associated intensity curve 122 generated by the arrangement function 6, which it uses to determine a set of preview rendering instructions 132 for creating a preview render using audio data of a predetermined "audio clip pack" (see below). This is significantly quicker than the full rendering process hence the preview render can be made available to the user much more quickly than the full audio render 114.

Figure 12 shows an example of a section template Ilfin in further detail. Figure 12 is a high-level block diagram illustrating certain structure of the section template 1 lfin. As will be appreciated the structure of the section template 116n can be embodied using any suitable data format. Reference is made to both Figure 11 and Figure 12 in the description below.

The section template 11On defines one or more musical parts '202a for use in a section that is IS arranged according that section template 11(m. The section template II 6n also defines a musical function 202b for each of those musical parts. As described above, the musical parts 202a may for example correspond to different instruments (such as piano, bass, drums/percussion, strings, synth etc.) and each of those musical parts may have a musical function 202b such as lead, harmony, chords etc. In addition, the section template 116n also defines a mapping function 202c for a section arranged according to that template. The mapping unction on 202c defines a mapping of intensity values to the rendering settings 122. It is this mapping ftinction 202c that defines how each intensity value on the intensity curve 122 is mapped to one or more of the rendering settings 112, such as automation 112a. That is to say, the mapping function 202c defines how the intensity curve 122 maps onto the rendering settings 112 of the custom arrangement 110 and therefore allows the one or more settings in question to be modulated over the duration of the custom arrangement 110 according to the intensity curve 108.

The section template 1 i6n also defines musical intensity limits 202d, which are upper and lower limits on the possible intensity values for any section arranged according to that template (denoted Illy, and 1",", respectively). This accounts for the fact that certain arrangements of musical parts 202a and their musical functions 202h may only be musically appropriate within certain intensity limits. For example, it may be that a certain arrangement of musical parts is not considered appropriate for overly "soft" dynamics hence the lower intensity limit Im," in that event would be increased to prevent that section template from being used with intensity values below that limit.

The MIDI sequencing function 8 uses the above information within the section template 116n to arrange, sequence and automate MIDI in a section of music having that template.

The intensity curve 122 can be used to drive an essentially infinite range of musical variation within sections in accordance with the respective mapping function assigned to each section. Hence there are infinite possibilities when it comes to introducing musical variation into the final audio render 114.

When it comes to generating the preview render according to the preview rendering instructions 132, the scope for introducing musical variation is much more limited. 'this is an acceptable trade-off that is made in order to he able to provide the preview render quickly, and by implementing the preview rendering method disclosed below an adequate degree of musical variation can still be introduced to approximate the finer musical variation that will eventually he exhibited in the fitn4 audio render 114.

In order to achieve this, as shown to the right-hand side of Figure 12, each section template 1 ion has associated with it a set of pre-generated audio clips (that is music segments that have been pre-rendered into audio format) which is referred to herein as a "section audio clip pack" 204n for the section template I 16n. Each section audio clip pack 204 which it is associated with the corresponding section template 116n, Each audio clip 208 in the audio clip pack 204 is an audio render of a section of the track in question that has been rendered at a particular section duration 210 and with a particular intensity setting or settings 212. A track is associated with a set of section templates which may be used to arrange that track. The section audio clip packs of Figure 12 that are associated with those section templates constitute a "track audio clip pack" containing all of the audio clips that may be used to create a preview render of that track. The track audio clip pack is associated with the track ID 102 of that track. A section audio clip 204n pack may have an associated clip pack ID 206 and/or the track audio clip pack may have an associated clip pack ID (which may comprise the applicable track identifier l02). That is to say, in order to generate each audio clip in the audio clip pack, the MIDI music segments 104 for track 1D 102 have been arranged and rendered according to the section template I ton in exactly the same way as described above ---i.e. first into digital musical notation format by the MIDI sequencing function 8, and then rendered into audio by the audio rendering component 12 -but in an earlier pre-rendering phase using a predetermined section duration and predetermined intensity curve. Hence, different audio clips within a clip pack can differ in any respect --within the limits of the corresponding section template -that can be varied by varying the parameters used to generate then: within the music productions system.

In the example of Figure 12, the section clip pack 204n is shown to comprise six audio clips. Thtee of the audio clips have been rendered at a section duration of 2 bars and the other clips at a section duration of 4 bars. For each of those section durations, the audio clips have been rendered with an intensity curve that, in this example, is flat across that section duration at intensity value Irma (i.e. the lower intensity limit set in the section template lion), at Imax (i.e. the upper intensity limit set in the section template 116n) and an intermediate flat intensity value half way between the two Imer, = litiax)/2. This is just one example of reasonable intensity settings that may be used to pm--generate the audio clips in a clip pack, and alternative intensity settings (including non-flat intensity with different intensity values at different times) may be used in the pre-rendering phase.

Figure 14 schematically illustrates, by example, how a preview audio render 314 may be generated using audio clip packs. The arrangement envelope 108 of Figure II is shown together with the associated intensity curve 122 defined across the arrangement envelope's curnulative duration. Each section 118a, 118b, t 18c is matched to one of the available section clip packs based on the section template 116a, 116b and 116c assigned to that section. The associated section clip pack is the clip associated with that section template and the section clip packs associated with section 118a, 11 8b and 118c are denoted by reference numerals 204a, 204b and 204e in Figure 14.

For each section 118a, 1 18b, 118c, an audio clip from the associated section clip pack 204a, 204b and 204c is selected based on the duration of that section 120a, 120b and 120c and the portion of the intensity curve 122 within that section. The portions of the intensity curve 122 within the first, second and third sections 118a, 118b and 118c respectively are denoted by reference numerals 122a, 122b and 122c.

In some implementations, there is a one-to-one relationship between sections and audio clips used for the preview render, i.e. one audio clip is used per section to approximate that entire section.

Alternatively, audio data of multiple audio clips may be used for a single section when generating the preview render. For example, different musical parts may be rendered as separate audio clips, which can be mixed together within a given section of the preview render. As another example, a section associated with a particular section template could he divided into multiple sub-sections, and audio data from different clips of the section audio clip pack associated with that section could be used in the different sub-sections (in this respect, it is noted that the term "section'' includes a sub-section of this nature unless otherwise indicated). Whilst this marginally increases the complexity of the preview rendering process, the impact is relatively negligible, and it is still significantly faster to create a preview render in this way than it is to generate a full audio render from scratch. Certain audio data may also he incorporated into a section at a time that is determined flexibly. For example, a pre-rendered drum till could be introduced at a point in a section defined by the intensity curve, to match its expected timing in the full tender.

In sonic circumstances it may be possible to exactly match the duration of a section to an audio clip in the associated section clip pack of the same duration, particularly where the section duration is a whole number of measures.

However, in general it will not be possible to exactly match the portion of time-varying intensity curve 122a, 122b, 122c within a given section to the discrete intensity settings associated with each audio clip in a clip pack. Hence the matching of the intensity curve 122a to the audio clips is an approximate matching in which the audio clip that was rendered with intensity settings closest to that portion of the intensity curve is selected. The selected audio clip is not expected to exactly match the corresponding section of the final audio render 114. III the example of Figure 14 audio clips of duration 4 bars are selected for the first and second sections 118a, 118b from the respective associated section clip packs 204a, 204b, and an audio clip of duration 2 bars is selected for the third section 118c from the associated section clip pack 204c, so as to match the section durations 120, 120b and 120c defined in the arrangement envelope 108. In the first section 118a the portion of the intensity curve 122a remains at relatively low intensity values therefore the audio clip of the correct duration rendered at the minimum intensity setting Irvin (i.e. flat intensity curve at Loin) is selected to approximate that section. This clip is denoted by reference numeral 314a. For the second section 118h containing the intensity peak 122a, the closest matching clip is that rendered at the maximum intensity setting 1,-na, (i.e. flat intensity curve at hence the audio clip rendered at that intensity setting is selected. This clip is denoted by reference numeral 314b. For the third section 118c, the two-bar clip at the medium intensity setting Lied is selected as the closest approximation of (he portion of the intensity curve 122c in that section. This clip is denoted by reference numeral 314c, The preview audio render 314 is created by sequencing the selected audio clips 3 Ma, 314b and 314c in the order of the corresponding sections 118a, 118b and 118c.

For comparison, the final audio render 144 that will eventually be generated is shown towards the bottom of Figure 14.

Figure 13 illustrates how, by applying the above steps, the continuously time-varying intensity curve 122 is approximated as a series of intensity step changes shown on the right hand side of Figure 13.

As noted, the intensity settings can be used to introduce rich dynamic variations within the final render 11 that will not be fully captured in the preview render 314. However, the preview render 314 will nonetheless be a reasonable approximation of the overall musical structure of the final reader 114 that is still in progress. This is because its constituent audio clips within the section clip packs 204a, 204b and 204c have been pre-produced and rendered using the same music production pipeline as the custom arrangement 110 and using intensity settings which at least approximately match the time-varying intensity settings used to produce the custom arrangement 110.

Figure 15 shows a preview rendering component 300 which receives the preview rendering instructions 132 from the preview controller 130 and creates the preview render 314 from a set of stored section clip packs 204 in the manner described above. These audio clip packs 204 make up the track audio clip pack for the track in question. The preview rendering instructions 132 are shown to define a sequence of three musical sections and. for each of those sections, a section clip pack ID denoted by reference numerals 206a, 206b and 206c for sections 11 8a, 118b and 118c respectively. The instructions 132 also defining the respective durations 120a, 120b and 120e of those sections and respective intensity settings 322a, 322b and 322c for each of those sections which for matching to the audio clips in the associated section clip pack. As will be appreciated, this is just one example mechanism by which the instructions 132 may identify the audio clips to be sequences. In general, the audio rendering instructions 132 can comprise any clip identification data identifying audio clips within the track audio clip pack 204 to he sequenced in order to create the preview render, as well as specifying the manner in which the audio data of those clips should be sequenced.

To allow the rendering instructions 132 to be matched to the audio clips, each audio clip 208 in each section clip pack 204n has associated production metadata which, in this example, indicates the duration and intensity settings according to which it was generated. This production metadata is stored in the section audio clip pack 204n in association with its constituent audio clips.

These intensity settings 322a, 322h and 322c are determined by the preview controller by approximately matching the portions of the intensity curve in each of those sections 122a, 122b and 122c to appropriate audio clip in the section clip pack 204a, 204b, 204c associated with that section, based on the production metadata therein. Hence, the preview rendering instructions 132 indicate which section clip packs to use and which individual clips within those section clip packs to use to generate the preview render 314. All the preview rendering component 300 needs to do is sequence audio data of the identified audio clips in accordance with the instructions 132.

The above description is just one example of how different arrangement parameters tin this case, defining the intensity curve) can be accommodated in the present context However the invention is not limited in this respect, and can be applied in the context of any music production parameters -such as arrangement and/or composition parameters. The underlying principles remain the same in that case. For example, with varying composition parameters, clip packs can be generated containing audio dips that have been pre-composed with different composition settings and rendered into audio format. Where compositional variation is introduced, clips within a section template may have different melodic content. This is a consequence of them having been rendered from digital music having different symbolic music data (i.e. different musical notes).

Arrangement parameters may also he used to determine composition in the final render to some extent. For example, the arrangement layer may request that the composition layer re-compose a certain section of an existing track in order to better fit a custom arrangement. In that event, the preview render can use a pre-composed segment selected to approximate the expected output of the composition layer.

The above could also be implemented in the context of production as a service. In that instance, the pre-generated music segments can comprise pre-generated audio renders of the user's own MID/ compositions.

Figure, 16 shows a schematic block diagram for a particularly efficient hardware architecture for implementing the preview rendering functionality. In this example, the main audio production pipeline and the preview controller 130 are implemented at a remote music production computer system 350 that is accessed front a local user device 352 via a network 354 such as the Internet. The main music production pipeline is denoted by reference numeral 1 and comprises the composition, arrangement, production and audio rendering functionality described extensively above.

However, the preview rendering component 300 that actually generates the preview render 314 is implemented at the local user device 352. In order to allow a particularly fast rendering of the preview audio render 314, when a user wishes to edit a track, the set of section audio clip packs 204 associated with its track in 102 --which constitute the track audio clip pack associated with that track -are downloaded from the remote music production computer system 350 to the local user device 352 via the network 354 and are stored in local storage of the user device 352 (step SOL This set contains the section audio clip pack associated with every possible section template that can be used to generate that track (so if there are twenty section templates for the arrangement layer 4 to choose from, the twenty associated audio clip packs are downloaded.

The user provides user inputs at a user interface 356 of the local user device 352 in order to vary the arrangement parameters 100 of the track in the manner described above. These arrangement parameters 100 are transmitted to the remote music production computer system 350 in one or more electronic messages such as an edit request to allow the above steps to he carried out at the music production computer system 350 (for example, using the API architecture described above, or via a Web interface etc.). This is shown as step S2. hi response to the received arrangement parameters, the main music production pipeline 1 begins the process of arrangement that will eventually result in the custom arrangement 110. In parallel, the preview controller 130 generates, at the earliest opportunity (i.e. once the arrangement envelope 118 is available), the preview rendering instructions 132 referred to above and transmits these back to the user device at step S4 so that the preview rendering component 300 can create the preview audio render 132 using the pre-downloaded section audio clip packs in accordance with those instructions whilst the remote processing within the main audio production pipeline I is still ongoing. The preview render 314 can therefore be created and be played out to the user at the local user device 352 promptly after the user has have provided his prefer red arrangement parameters 1041 The generation of the full audio render 114 can be instigated automatically, in response to the user's initial edit request whilst the:user is listening to the preview of the edit. However, preferably, the full audio render of an arrangement only begins after a separate request is sent by the user, after they have listened to a preview of that arrangement and decided they want the full audio render version. This is beneficial as not all previews have to be turned into full renders as the user may not wish to proceed on the basis of a current preview render and may which to make further changes first. This can result in a significant saving of computational resources, by discouraging the generation of unwanted full renders.

When the full audio render 114 is eventually completed sometime later. it is transmitted to the user device 352 for playing out to the user. This may be several minutes after the user has provided his preferred arrangement settings 100, but during that time he has already had an opportunity to preview an approximation of its musical stnicture The various components referred to above and in particular the production management component 13, the production engine 3 (that is, the audio rendering component 12, the performance component 10 and the arrangement component 4), the composition engine 2 the preview controller 130 and the preview rendering component 300 are functional components of the system that are implemented in software. That is, the composition system comprises one or more processing units -such as general purpose CPUs, special purpose processing units such as GPUs or other specialized processing hardware, or a combination of general and special purpose processing hardware configured to execute computer-readable instructions (code) which cause the one or more processing units to implement the functionality of each component described herein. Specialized processing hardware such as GPUs may he particularly appropriate for implementing certain parts of the ML functionality of the composition engine 2 and the other components also when those are implemented using ML. The processing unit(s) can be embodied in a computer device or network of cooperating computer devices, such as a server or network of servers. in this context, the system refers to the overall system which may, as noted, include the user device 3.52 of which certain functionality is implemented.

Figure 9 shows a schematic block diagram illustrating some of the structure of the API 14, which is shown to comprise a computer interface 42 and a request manager 44 coupled to the computer interface 42. The request manager 44 manages the requests received at the computer interface 42 as described above. In particular, the request manager 44 allocates each request to an appropriate one of the job queues 31 and assigns a unique job identifier (ID) to each request (both internal and external). The job IDs service various purposes which are described later. The API 14 can be implemented as a server (AI'l server) or server pool. For the latter, the request manager 42 can he realized as a pool of servers and the computer interface 42 can be provided at least in part by a load balancer which receives requests on behalf of the server pool and allocates each request to one of the servers of the server pool 44. which in turn allocates it to the appropriate job queue. More generally, the API 14 is in the form of at least.

one computer device (such as a service) and any associated hardware configured to perform the API functions described herein. The computer interface 42 represents the combination of hardware and software that sends and received requests, and the request manager 44 represents the combination of hardware and software that manages those requests. Requests are directed to a network address of the computer interface, such as a URL or URI associated therewith.

The API 14 can be a Web API, with at least one Web address provided for this purpose. One or multiple. such network addresses can be provided for receiving incoming requests.

Whilst the above has been described in terms of specific embodiments, these are not exhaustive. The scope of the invention is not defined by the described embodiments but only by the accompanying claims.

Claims

Claims: 1. A computer-implemented method of rendering music into audio format, the method comprising: receiving, at a music production system, one or more music production parameters defined by a user for producing a custom piece of music; using the user-defined music production parameters to produce a custom piece of music in digital musical notation format; rendering the custom piece of music into audio format for outputting to the user: and creating, for outputting to the user before the rendering of the custom piece of music has completed, a preview audio render thereof, using pre-generated music segments stored in audio format, the music segments having been generated by producing multiple sections of music according to different predetermined music production parameters, and rendering the multiple sections of music into audio format: wherein the pre-generated music segments are stored in association with production metadata indicating the predetermined music production parameters used to produce them; and wherein the preview render is created by matching sections of the custom piece of music to different ones of the pre-generated music segments, based on the user-defined music production parameters and the production metadata, and sequencing audio data of the different pre-generated music segments, the preview render comprising the sequenced audio data.
2. The method of claim 1, wherein at least two of the pre-generated music segments differ in at least one of the following respects: musical dynamics, duration, tempo.melodic content, musical parts, combination of musical parts, musical function of musical parts, instrument or sound characteristics, or melodic content; wherein that or those differences are indicated by the music production metadata.
3. The method of claim 1 or 2, wherein the music segments have been generated initially in digital musical notation format and rendered therefrom into audio format.
4. The method according to claim 3, wherein at least two of the music segments in digital musical notation format have different: automation settings. dynamics settings, instrument or sound settings, symbolic music data digital effects settings, wherein that or those differences are indicated by the music production metadata.
5. The method of any preceding claim, wherein the one or more user-detined music production parameters and the production metadata are used to determine an order in which the audio data of the different pm-generated music segments are to be sequenced.
6. The method of any preceding claim wherein the one or more music production parameters comprise at least one of: a duration for the custom piece of music, and a timing for a musical event within the custom piece of music.
7. The method of claim 6, wherein the musical event is in concentration of musical intensity. ^IS
8. The method of any preceding claim, wherein the one or more user-defined music production parameters are processed by at!east one music production component of the music production system so as to autonomously determine one or more further music production parameters based thereon, wherein the one or more autonomously-determined music production parameters are used to produce the custom piece of music, and the sections of the custom piece or music are matched to said different ones of the pre-generated music segments by comparing the autonomously-determined music production parameters with the production metadata.
9. The method of claim 8, wherein the one or more autonomously-determined music production parameters comprise at least one of: one or more musical part parameters for the custom piece of music, and one or more composition settings for the custom piece of music.
10. The method of claim 8, wherein the one or more autonomously-determined music production parameters comprise one or more composition settings for selecting, from a set of available probabilistic sequence models of a composition engine, one 01 more probabilistic sequence models for autonomously composing music for at least one section of the custom piece of music. l0
11. The method of claim 8, 9 or 10, wherein the music production component is an artificial intelligence music production component.
12. The method of any preceding claim, wherein the sections of the custom piece of music arc marched to said different ones of the pre-generated music segments by comparing the user-defined music production parameters with the production metadata.
13. The method of any preceding claim, wherein the custom piece of music exhibits musical variations determined from the music production parameters, and the sections are matched to the pre-generated music segments to approximate the musical variations in the preview render.
14. The method of claim 13, wherein the music-al variations are determined from an intensity cur,*e defined by the music production parameters and the sections are matched to the pre-generated music segments by matching a portion of the intensity curve in each section to at least one of the pre: generated music segments.
15. The method of claim 14, wherein at least one of the music segments has been generated with a fiat intensity curve having a single intensity value across a duration of that music segment.
16. The method of claim 14 or 15, wherein at least one of the music segments has been generated with a time-varying intensity curve having different intensity values within a duration of that music segment.
17. The method of any of claims 13 to 16, wherein the musical variations are introduced by varying at least one of the following: dynamic settings, automation settings, tempo, composition settings, musical pans, a musical function of at least one musical part, and instrument or sound settings.
18. The method of any preceding claim, wherein the custom piece of music is a custom arrangement of an existing piece of music and the music production parameters comprise one or more arrangement parameters for determining the custom arrangement.
19. The method of claim 18, wherein the custom arrangement is an arrangement of predetermined music elements.
20. The method of claim 19, wherein at least one of the predetermined music elements is re-composed automatically, by a composition engine of the music production system, to fit the custom arrangement.
21. The method of any preceding claim, wherein the music production parameters comprise one or more composition parameters and a composition engine of the music production system is caused to autonomously compose at least one music element for use in at least one section of the custom arrangement based on the composition parameters.
22. The method of any preceding claim, wherein the custom piece of music is produced by determining section templates for the sections of the custom piece of music to be produced, and multiple pre-generated music segments are associated with each section template for matching to a section having that section template.
23. The method of claim 22, when dependent on claim 14, wherein each section template comprises at least one of an upper intensity limit and a lower intensity limit, wherein a section template is selected for each section of the custom piece of music by comparing the portion of the intensity curve in that section with the intensity limit(s) of that section template, wherein the music segments associated with each section template are generated within the intensity limit(s) of that section template.
24. The method of claim 22 or 23, wherein the section template defines at least one of: musical parts, a musical function of at least one musical part, a mapping of intensity curve values to automation settings.
25. The method of any preceding claim, wherein the step of rendering the custom piece of music into audio format is instigated in response to a full render request received from the user after the preview render has been made available for outputting to the user.
26. The method of any preceding claim, wherein the music production parameters comprise one or more performance parameters which are used to introduce performance variation into at least one section of the custom arrangement.
27. A user levice creating an audio render of a custom piece of music, the user device 2.0 comprising: a network interface for communicating with a remote music production system; a user interface for receiving user inputs from a user of the user device; memory; one or more processors configured to execute computer-readable music production code which is configured., when executed, to cause the one or more process to carry out the following operations: downloading from the remote music production system and storing in the memory of the user device a set of pre-generated music segments in audio format; processing user inputs received at the user interface to determine at lean one music production parameter for producing a custom piece of music; generating and transmitting to the music production system at least one electronic message comprising the at least one music production parameter; receiving from the music production system arrangement instructions for creating at the user device an audio render of the custom piece of music: and sequencing audio data of at least two of the pre--generated music segments according to the arrangement instructions, and thereby creating an audio render of the custom piece of music comprising the sequenced audio data.
28. A computer program product comprising computer-readable code stored on a none transitory computer readable storage medium, which is configured, when executed on one or more processors. to carry out the operations of any preceding claim.
29. A music production system comprising: at least one input configured to receive user-defined music production parameters; and one or more processors configured to execute computer-readable code which, when executed. causes the one or more processors to carry out the steps of any of claims 1 to 25.