WO2006027605A2 - Extendible media content rendering system - Google Patents

Extendible media content rendering system Download PDF

Info

Publication number
WO2006027605A2
WO2006027605A2 PCT/GB2005/003487 GB2005003487W WO2006027605A2 WO 2006027605 A2 WO2006027605 A2 WO 2006027605A2 GB 2005003487 W GB2005003487 W GB 2005003487W WO 2006027605 A2 WO2006027605 A2 WO 2006027605A2
Authority
WO
WIPO (PCT)
Prior art keywords
media content
code
program code
client device
content
Prior art date
Application number
PCT/GB2005/003487
Other languages
French (fr)
Other versions
WO2006027605A3 (en
Inventor
Peter Murray Cole
Original Assignee
Tao Group Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tao Group Limited filed Critical Tao Group Limited
Publication of WO2006027605A2 publication Critical patent/WO2006027605A2/en
Publication of WO2006027605A3 publication Critical patent/WO2006027605A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • G06F9/44526Plug-ins; Add-ons
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2353Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25833Management of client data involving client hardware characteristics, e.g. manufacturer, processing or storage capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the present invention relates to an extendible media content rendering system, and particularly although not exclusively to such a system for use with embedded devices such as mobile phones.
  • audio content e.g. pieces of music, sounds, sound effects and spot effects
  • audio content can be delivered with a very flexible set of sounds, and rendered consistently on a wide range of devices with a wide range of CPU capabilities. More generally, this is a problem with all types of media content.
  • Music delivered in electronic device-readable form typically falls into two types, 'score' format and 'audio' format.
  • the former contains a symbolic list describing which notes to play.
  • the second provides an encoding of an actual audio performance, i.e. one which represents the sound that the listener will actually experience.
  • Audio formats attempt to encode all aspects of the listeners' experience, (in much the same way as a track on an audio CD) but data in these formats tends to be large and costly in terms of bandwidth, and is entirely fixed in its representation. Score formats are small in size and therefore cheap on bandwidth, but suffer from a lack of information on how to perform them precisely.
  • Score formats attempt to get round this problem by incorporating data which specifies the 'instrument' to be used.
  • a MIDI score specifies the General MIDI instrument to use to play a specific set of musical notes (e.g. "Grand Piano").
  • this relies on the playback device having a pre- installed set of instruments with which to perform (render) the score.
  • the sound of each instrument set is not completely consistent across platforms; each MIDI platform has its own general midi set implementation, each of which can sound very different even when rendering the same MIDI score. This leads to inflexibility in that the score can only choose from among the instruments on the device; it cannot create its own instruments and send those with the score, and the content author cannot predict exactly how the music will sound on each platform which receives it.
  • SAOL Structured Audio Orchestra Language
  • Audio plugin architectures such as VST exist which allow for pre-compiled code to be incorporated by audio authors into desktop authoring packages, for the purpose of achieving flexible sound manipulation, to create data in audio format (or sometimes to create sample data for use in score-based contexts).
  • Such plugins must be pre-compiled into native code appropriate to the local device used for authoring content; the plugins are installed manually by content authors on their own authoring devices.
  • the plugins used in such a content are used in-situ by the content authors; they are not required for customers to listen to the audio content which might be created using such tools.
  • the SSEYO Koan Interactive Audio platform allows for audio content to be expressed as "Vector Audio", which allows content to be expressed in a compact form which includes parameters defining how to use pre-installed audio synthesizer components within a flexibly defined network.
  • the Koan Vector Audio solution allows for an arbitrary network of the pre-installed synth components to be pre-defined (together with information on notes to play, and also with parameters which define how to set specific controller values which are exposed by the pre-installed synth units).
  • Koan Vector Audio may be delivered within a container file (e.g. MIDI files or as text within HTML files) if required.
  • a content provider will distribute a stream of data that represents media of some form (such as audio, video, text or animation). If this is in a standard format (such as mpeg2), then the content provider may rely on the client system (that will play the content) to be able to handle it.
  • media of some form such as audio, video, text or animation
  • the client system will have a mechanism of obtaining the data, converting it to the required output format and presenting it to the user. This normally involves several steps.
  • the currently installed mechanisms may not be suitable for various reasons:
  • the content provider may want to change the way in which the media is processed. For instance, a frequency filter may be desired on an audio clip.
  • the content may be encrypted with a proprietary mechanism.
  • the content provider may wish to add a specific form of visualisation to an audio clip.
  • the codec will likely come from a third party rather than the content provider. Thus the content provider must rely on the security and availability of the systems of this third party.
  • a method of extending media content by embedding within content to be streamed program code or a reference to such code which, when said code is run by a client device receiving the media content, enables the creation on the client device of a program module for use in rendering the content.
  • an extendible media content rendering system comprising a client device arranged to receive media content which has embedded within it program code, or a reference to such code, the device using the code to enable the creator of a program module for use in rendering the content.
  • the invention further extends more generally to a computer program, to a data format, and to a datastream. It further extends to a data format and to a computer program when stored on computer-readable media.
  • This invention provides a means of embedding extended signal processing code ("DSP code") into media content, where that content describes a network of media processing operations (i.e. it describes the topology of a "media streaming system").
  • MDR Media Data Representation
  • Signal processing code may be used to express the logic required to perform a set of DSP operations. Such operations are described as “DSP Units” (or “Units”), and each Unit might be used to:
  • a data "source” such as a tone generator in an audio rendering system, a sequence of meta event information such as MIDI commands, a graphical data stream in a screen-saver style media system, a unit used to deliver graphical data either from a data area defined within the media data itself or delivered dynamically through a stream
  • DSP operations such as filtering/chorus/reverb in an audio rendering system such as a software synthesiser, or graphical effects unit in a more general graphics media presentation
  • the core media system may manage the inter-communicating network of DSP units to take source data / filter source data / sink data in the manner of a traditional media streaming system (including MIDI or audio rendering systems) to deliver resultant data to the target device or host "player" software system.
  • the core media system might also be used to feed-in external data to the stream (for example, externally supplied MIDI event data or externally supplied graphical file data).
  • the core media system may contain a "library" of pre-installed DSP Units expressed as classes for the intent system.
  • the invention may (but need not) be used in conjunction with Tao Group's intent system which incorporates a binary-portable operating system ("Elate”) with an extensive multimedia system providing a means of executing multimedia applications on a wide-variety of hardware platforms. These applications are written in VP ("Virtual Processor"), a binary portable code.
  • VP Virtual Processor
  • the invention preferably includes the ability to embed sound synthesis code into music content of a score-type format (such as MIDI).
  • a score-type format such as MIDI
  • the preferred solution has an audio data representation which contains both network layout information, and DSP code defined as pre-compiled, binary-portable classes which tells the device's embedded synthesis engine exactly how to create its own 'instruments' with which to render the content.
  • the representation may be delivered (for example) within the SYStem Exclusive ("SYSEX") messages within standard MIDI files; this allows the basic audio content within the file to be rendered using legacy MIDI implementations; where the proposed engine is present on a device, that engine can be used to render the MIDI file according to the audio processing defined within the representation, and in which case the MIDI file would be rendered exactly as intended by the content author.
  • SYSEX SYStem Exclusive
  • the representation may also contain network definition information on how to configure the underlying network of DSP modules to render the audio for each instrument within the composition.
  • the DSP code may be delivered in binary- portable format, meaning that instrument definitions only need to be translated at load-time to be used by the engine. Because the translation process is very efficient, and because the resultant native code is very efficient, the solution can be used to render audio consistently on a wide range of devices, provided the proposed engine is present.
  • network definition need not be in the form of code; it could be represented in a hierarchical representatorial format which is interpreted by the engine — e.g. XML format within SYSEX messages in the MIDI file.
  • the instruments referenced by the network might be represented as either combinations of known "core" units which are pre-installed in the engine (in which case the representation does not have to contain new code for those units e.g. a default reverb unit, or a default square-wave based tone generator unit); or might be for combinations of completely new, arbitrary units which are defined in terms of class code which overrides fundamental engine classes (e.g. arbitrary tone generators, envelope generators, effects algorithms etc.); or any combination thereof.
  • the units are typically very flexible, with the ability to be reconfigured with parameterised information.
  • the same reverb unit could be used in two or more different ways from the same single code representation within one definition file; each instrument could use a different small parameter set to define how the "common" code would be used slightly differently in different contexts; this further reduces the size of the representation data.
  • pre-installed units can be leveraged, and because code can be used differently in different contexts within the same network by providing small parameterisation sets, and because the underlying code provides for compact representation of algorithms and class definitions, the representations can be very small even for complex sound representations.
  • the representation does not need to be compiled by the client, giving a very real performance benefit over approaches such as SAOL which require compilation into native code to deliver equivalent performance.
  • the code is delivered as classes, which can extend on the fundamental classes provided by the underlying audio system, the class definitions can be very small indeed.
  • a class could be provided to override the operation of the underlying reverb implementation for the purposes of the rendition of a piece of music.
  • the definition could include code that defines a number of new tone generators (e.g. particle synthesisers), which derive only from the fundamental underlying tone generator class.
  • the definition may be processed by the core synthesiser engine, which separates into both network topology and effect unit references. Any new code within the definition is extracted to working memory.
  • a set of communicating class instances is constructed that reflects the network topology, whereby inter ⁇ chained effect units are constructed (in the manner of a normal pipelined DSP system) from the appropriate combination of either pre-installed code in the core engine, or code which is supplied in the definition.
  • the engine then generates the audio required to render the description in the manner of a normal software synthesiser system (such as the SSEYO Koan Synth Engine).
  • the system is also configured to be changeable in real-time due to environmental influences, such as MIDI controller changes (which units may be configured to arbitrarily receive and process if required), or even fundamental real-time changes to network topology and/or units to use.
  • MIDI controller changes which units may be configured to arbitrarily receive and process if required
  • the core audio and graphics processing libraries for intent consist of classes and methods which execute basic audio and graphics processing capabilities. These typically include methods for playing a block (or sequence of blocks) of audio data, or for rendering graphics to a display area.
  • Figure 1 shows a digital signal processing pipeline
  • Figure 2 shows the preferred structure for data contained within the media stream
  • FIG. 3 illustrates how the preferred system of the present invention may be integrated with the Tao intent Midi Output Manager ("MOM");
  • Figure 4 is a block diagram illustrating the preferred mode of operation of the present invention.
  • Figure 5 shows how Midi Event Data are processed in the preferred embodiment.
  • Figure 1 illustrates a conventional digital signal processing pipeline (sometimes called a network or a netlist) suitable for use with the present invention.
  • a media datastream 10 is supplied to a splitter or de-multiplexer 12 which splits the datastream up into graphics and audio components.
  • the graphic stream passes sequentially through a number of different object handlers or codexes
  • DSP Digital Signal Processing
  • filters 14, 16 such as filters 14, 16 and a colour remapping unit 18.
  • the output is then passed onto a graphics device 20.
  • the audio stream likewise, passes sequentially through a number of object handlers such as a pitch shifter 22, a reverb unit 24 and a filter 26.
  • the output datastream then passes to an audio device 28.
  • the topology of the network is defined by means of a Media Data Representation ("MDR"), which may either be predefined and fixed or may alternatively be supplied in real time as part of the input stream 10.
  • MDR Media Data Representation
  • the network topology (for example the number of object handlers, their position in the sequence, the input parameters they need to operate and so on) need not be determined in advance but can be supplied and modified as required by the user.
  • an application When an application (not shown) makes a request to play a media file, it sends the network description to a stream manager (not shown) which then constructs a network on the basis of the MDR.
  • the stream manager then signals that it is ready to receive the input datastream 10; the datastream can either be passed to it or it may proactively go and "grab" it itself.
  • the datastream is then supplied to the object handlers in the appropriate order. This may either be done by the network itself, with each object calling the next one or the stream manager may retain control and may call the objects one by one as they are required.
  • the present invention allows the creator of the media steam not only to pre ⁇ define the topology of the network but also to modify that topology as the datastream is being consumed. This is achieved in the preferred embodiment by embedding the MDR directly in the stream.
  • the stream may also include extended DSP code which tells the client device exactly how to create its own DSP nodes with which to render the content.
  • DSP nodes object handlers
  • the media stream will be rendered consistently regardless of the client platform, and without any need for the user to download additional modules in advance, nor to wait while those modules are compiled. This flexibility also means, of course, that there is no longer a requirement for the client device to include a compiler.
  • Figure 2 shows the preferred structure for the MDR (Media Data Representation) within the datastream.
  • the datastream includes information on the netlist configuration, which itself includes unit configuration information along with unit code. This is followed by information on the way in which the blocks are organised. After the netlist configuration information comes the content data.
  • the netlist configuration information may be embedded as required in any place within the datastream, thereby allowing the network to be configured in real time as the content data is consumed on the client device.
  • the unit configuration information may take the form of an XML description with embedded data or alternatively a compressed binary representation.
  • the unit code will typically be in binary.
  • the netlist configuration may refer to any combination of pre-installed and dynamically defined DSP unit classes, that is any valid combination according to the semantics of the network.
  • the datastream may be defined by the standard MIDI format. This format allows META data to be included.
  • the MDR is embedded within a SYSEX message.
  • MIDI is only one type of datastream that is suitable for use with the present invention.
  • the invention allows the content within the file to be rendered using legacy MIDI implementations.
  • that engine can be used to render the MIDI file according to the audio processing defined within the representation, so that the MIDI file will be rendered exactly as intended by the content author.
  • the DSP Units referenced by the Network Definition might be represented as either combinations of known "core" DSP Units which are pre-installed in the system (in which case the representation does not have to contain new code for those units e.g. a default reverb unit, or a default square-wave based tone generator unit, an opaque graphics filter); or might be for combinations of completely new, arbitrary units which are defined in terms of class code which overrides fundamental engine classes (e.g. arbitrary tone generators, envelope generators, effects algorithms, graphical filters etc.); or any combination thereof.
  • fundamental engine classes e.g. arbitrary tone generators, envelope generators, effects algorithms, graphical filters etc.
  • the network definition describes how to construct a network of inter-communicating DSP units that may be used to render the audio for each instrument within the composition.
  • the network definition describes how to construct a network of media streaming DSP units that are used to render any combination of either data that is pre-supplied in the MDR, or alternatively data that is provided automatically (perhaps via the internet).
  • the new DSP Unit code that may be embedded within the Network Definition is delivered in binary-portable format, meaning that DSP unit definitions only need to be translated and if appropriate proof-verified at load-time to be used by the rendering engine. Because the translation process is very efficient, and because the resultant native code is very efficient, and because the code can be proof-carrying and therefore always known to be safe to run, the solution can be used to render both audio and graphics consistently on a wide range of devices, provided the proposed engine is present.
  • the new DSP Unit code may only derive from known safe classes within the system; a system dictionary must be used to define which classes are safe to use, and which methods on those classes and which system tools are safe to invoke. Note that a new DSP Unit may derive from another new DSP Unit (etc. ...) provided that the base new DSP Unit properly subclasses a valid system DSP Unit.
  • the DSP Unit Code may either be supplied explicitly as binary data within the MDR, or it might be defined by reference (in which case, for example a URL could be used to retrieve the specified code).
  • a URL could be used to retrieve the specified code.
  • the MDR could refer to code using an XML descriptor of the following form, where the URL could return ajar containing the code with the appropriate version update, that could get installed/cached on the device as appropriate depending on what version might (or might not) be installed already on the system.
  • the new DSP Unit code is always based on DSP Unit class code which is pre ⁇ defined within the system and know to be safe.
  • Each DSP Unit might either extend quite simple base classes (e.g. simple tone generator) or it might extend quite sophisticated classes (e.g. complex reverb system).
  • the system could also use component version dependency marking, to indicate dependencies on other code elements that might require dynamic updating of the underlying system software.
  • the DSP Units might represent such units as Tone Generators, LFOs, Chorus, Reverb, DLS-based sample playback, etc.
  • this code will be contained within a SYSEX of a MIDI data stream.
  • the DSP Units might represent decryption modules, codecs, filters or visualisation units.
  • DSP Units are typically very flexible, with the ability to be reconfigured with parameterised information that can either be supplied within the MDR or even be generated dynamically within the network itself.
  • the same reverb unit could be used in two or more different ways from the same single code representation within one definition file; each instrument could use a different small parameter set to define how the "common" code could be used slightly differently in different contexts; this further reduces the size of the representation within the MDR.
  • visualisation units may have parameters to modify their look, decryption units may have encryption keys provided, and codecs may have quality parameters.
  • pre-installed unit class code is always leveraged to a greater or lesser extent, and because code can be used differently in different contexts within the same network by providing small parameterisation sets, and because the underlying code provides for compact representation of algorithms and class definitions, the representations can be very small even for complex sound representations or complex graphical events.
  • the same reverb unit could be used in two or more different ways.
  • a class could be provided to override the operation of the underlying reverb implementation for the purposes of the rendition of a piece of music.
  • the definition could include code that defines a number of new tone generators (e.g. particle synthesisers), which derive only from the fundamental underlying tone generator class.
  • the base unit class code for a filter would handle the basic data transfer, and the additional code would just manipulate the data.
  • the datastream will in many practical embodiments be supplied across a network, for example, across a wireless network in the context of mobile phones. However, this is not essential and the invention is equally applicable to a datastream which is supplied locally, for example from a file on a hard-drive, on a CD or on a DVD.
  • program code within the media content is delivered as Virtual Processor ("VP") -represented binary code.
  • VP Virtual Processor
  • client devices should have suitable platform such as Tao's intent TM System, allowing them to read such binary.
  • the code could be supplied in any convenient widely-understood language such as Java.
  • Figure 3 illustrates how various applications might use the preferred embodiment (known as Vector Softsynth) within the context of the Tao intent Midi Output Manager (“MOM”) system residing on a device.
  • MOM Tao intent Midi Output Manager
  • Each application on the device uses the services of the general MOM framework to route data through to a target MIDI rendering system.
  • the proposed new component is termed the "Vector Softsynth" (or “Extendible Modular Synthesizer”), which renders MIDI data using the proposed functionality. Audio data from all soft synths is delivered to one of a variety of potential destinations (depending on the device configuration).
  • Figure 4 shows how a class-based hierarchy can be used to deploy the Vector Softsynth within the above system.
  • This API class is used to create an instance of the appropriate renderer class using a class factory to create the correct class type to match the rendering requirements of the application (this class might, or might not, be in a separate process depending on the needs of the application).
  • the base class for the renderer class is ave/iss/mom/r/mom/class.
  • the ave/iss/mom/r/soft/class software synth specialisation is further specialised by the Vector softsynth specialisation class which might for example be /ave/iss/mom/r/soft/imp/com/tao-group/vector/class.
  • the application can send MIDI data to the Vector softsynth class via the MOM framework using standard methods within the MOM client API.
  • the Vector softsynth must implement support for standard MOM class methods (including set-format and calculate), to respond to MIDI data within the file in order to respond to audio block rendering requests within the system (calculate method):
  • the system In response to requests for the system to render an audio data block, the system must check the system MIDI command queue for commands matching the audio block timestamp:
  • the synth network When a SYSEX command is found which represents a network definition, the synth network must be reconstructed dynamically in accordance with the MDR within the MIDI command. This requires the MDR to be parsed, the network of intercommunicating classes to be constructed (with proof verification of dynamically defined classes being performed as required).
  • MIDI command When a MIDI command is found which can influence the state of the rendering system (e.g. note on/off commands), those commands must be supplied to all units within the network that might be affected (e.g. tone generators) such that those units can respond to the MIDI commands by changing state or internal properties (as appropriate).
  • tone generators e.g. tone generators
  • Tone generators will prepare audio data in their output audio block in accordance with internal state reflecting recent note on/off events and other properties (and may perhaps refer to audio sample data with the system); filters will prepare audio data in their output audio block in accordance with source audio data blocks from other units, and other properties; various unit outputs (e.g. from LFOs) may affect arbitrary parameters within other units in the system. All units at the end of the system which are marked as being "output" units, have their final outputs mixed together to create the final audio block output for the synthesizer.
  • Figure 5 illustrates how the Vector Softsynth processes MIDI event data.
  • MIDI event data enters the system, the system must decide what to do with each item of data. If data is a network description SYSEX command, then this information is used to create or modify the network of intercommunicating DSP units. If however the data is not a network description SYSEX command, then the command is put in a timestamped MIDI event queue from which it is later absorbed as required to process audio blocks required for the output system. Audio blocks are rendered using the DSP network.
  • Step "2." in the flowchart box of figure 5, requires the system to process a Vector network description command which is contained within the MIDI system exclusive message.
  • command data including compressed binary representations of XML.
  • An example representation is given in the table below. This represents a description of how to render an entire MIDI network. Note that the DSP code elements might either be embedded within the MDR itself, or could be referred to via a URL with optional version numbering and optional dependencies on other system component versions.
  • the optional midilines section details which DSP module description is to be used to render each MIDI line (note that if not present, a midiline is assumed to be rendered by the first listed module).
  • the optional classes section details new classes that are represented within the description, which are always based on classes that exist within the pre-installed system.
  • the mandatory modules section may contain one or more different module entries. Each module entry describes how sounds are to be generated for that module.
  • the information present within the module definition contains full and complete information for the DSP units that are present, how they are to be interconnected, and what parameter settings are for each effect unit within the module. Note that where parameter values are not defined, they are assumed to be the default values that are defined within the effect unit class code.
  • outputs from some modules are capable of being fed as inputs to other modules, and also used to modulate parameters of any other module.
  • Parameters may be assigned to respond dynamically to arbitrary incoming MIDI controller values (e.g. RPN, NRPN or SYSEX) that are present elsewhere in the MIDI file.
  • RPN incoming MIDI controller values
  • NRPN NRPN
  • SYSEX incoming MIDI controller values
  • AU of this goes into a SYSEX command in MIDI.
  • MIDI command stream there might be commands which supply new parameter values (e.g. using MIDI NRPN, RPN or SYSEX controller messages).
  • the Vector synthesizer can interpret this information dynamically to respond to external MIDI controllers.
  • Effects units may contain any arbitrary digital signal processing logic. They may include tone generators, effects units such as reverberation system, low- frequency oscillators, envelopes and many other possibilities. DSP Units may also be capable of performing higher level operations that would benefit from the high-performance binary-portable translatable code, such as automatic composition or automatic harmonisation.
  • control rate is e.g. 100Hz
  • a tone generator “calculate” method will cause it to consume note on/off events in the command queue for that module; and render to an output buffer the generated tone.
  • units such as reverb units, these will take input from the output stages from units which feed within the network topology, and populate an output buffer based on the input audio data.
  • the final unit in the module is the one which provides output to the audio block (which is mixed with audio data produced by all other modules in the system). Note that to have more than one unit contribute to the final output, an adder unit may be used as the final unit in the module.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Computer Graphics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

An extendible media content rendering system which uses a novel Media Data Representation to define how an audio or media datastream is to be rendered on a client device such as a mobile phone. The system allows for media content authors to include in-stream instructions to reconfigure the digital signal processing network used by the client, in real time. It also allows authors to include within the media stream binary code which enables the client to generate new network nodes (for example audio instruments) as required. In an audio implementation, the new information may be contained within a SYSEX instruction of a data file encoded in MIDI format.

Description

Extendable Media Content Rendering System
The present invention relates to an extendible media content rendering system, and particularly although not exclusively to such a system for use with embedded devices such as mobile phones.
There is currently no way for content authors to create audio content (e.g. pieces of music, sounds, sound effects and spot effects) such that this audio content can be delivered with a very flexible set of sounds, and rendered consistently on a wide range of devices with a wide range of CPU capabilities. More generally, this is a problem with all types of media content.
Music delivered in electronic device-readable form typically falls into two types, 'score' format and 'audio' format. The former contains a symbolic list describing which notes to play. The second provides an encoding of an actual audio performance, i.e. one which represents the sound that the listener will actually experience.
An obvious example of the former is MIDI, and 'WAV files or 'MP3 ' files are good examples of the latter. Both of these formats have their own intrinsic benefits and limitations. Audio formats attempt to encode all aspects of the listeners' experience, (in much the same way as a track on an audio CD) but data in these formats tends to be large and costly in terms of bandwidth, and is entirely fixed in its representation. Score formats are small in size and therefore cheap on bandwidth, but suffer from a lack of information on how to perform them precisely.
Score formats attempt to get round this problem by incorporating data which specifies the 'instrument' to be used. For example a MIDI score specifies the General MIDI instrument to use to play a specific set of musical notes (e.g. "Grand Piano"). However, this relies on the playback device having a pre- installed set of instruments with which to perform (render) the score. The sound of each instrument set is not completely consistent across platforms; each MIDI platform has its own general midi set implementation, each of which can sound very different even when rendering the same MIDI score. This leads to inflexibility in that the score can only choose from among the instruments on the device; it cannot create its own instruments and send those with the score, and the content author cannot predict exactly how the music will sound on each platform which receives it. Note that certain formats (e.g. XMF) allow content authors to include specific sound samples and sample sets that can be used by the MIDI rendering system to render the audio with flexible sound set; but these formats suffer from having large format sizes (they take up a lot of bandwidth) even when using samples in a compressed format (such as MP3).
A particular type of advanced score format, called Structured Audio Orchestra Language (SAOL), attaches high level code (describing sound synthesis models) to score format content. This gives flexibility in that the SAOL content developer can compose the score and design the instruments with which it is played. Csound, its forerunner, also works in this way, using '.csd' files to combine the score and the instrument algorithms in one file. Although playback and interaction with the instruments is in real-time, the synthesis algorithms have to be compiled before the sound can be rendered. This leads to very real problems in delivering such content to a range of target devices; a typical step in generating an SAOL piece might require it first being compiled to C++, then compiled to the native code of the local platform, at which point the audio can finally be rendered. This might therefore require a compiler to be present on the client device — a problem for embedded devices such as mobile phones. CSoimd is described at http://www.csounds.com/whatis/index.html. See also http://mitpress.mit.edu/e-books/csound/frontpage.html and http://new.math.uiuc.edu/audible/csound/howtorun.htm.
Audio plugin architectures such as VST exist which allow for pre-compiled code to be incorporated by audio authors into desktop authoring packages, for the purpose of achieving flexible sound manipulation, to create data in audio format (or sometimes to create sample data for use in score-based contexts). Such plugins must be pre-compiled into native code appropriate to the local device used for authoring content; the plugins are installed manually by content authors on their own authoring devices. The plugins used in such a content are used in-situ by the content authors; they are not required for customers to listen to the audio content which might be created using such tools.
The SSEYO Koan Interactive Audio platform allows for audio content to be expressed as "Vector Audio", which allows content to be expressed in a compact form which includes parameters defining how to use pre-installed audio synthesizer components within a flexibly defined network. The Koan Vector Audio solution allows for an arbitrary network of the pre-installed synth components to be pre-defined (together with information on notes to play, and also with parameters which define how to set specific controller values which are exposed by the pre-installed synth units). Koan Vector Audio may be delivered within a container file (e.g. MIDI files or as text within HTML files) if required.
A content provider will distribute a stream of data that represents media of some form (such as audio, video, text or animation). If this is in a standard format (such as mpeg2), then the content provider may rely on the client system (that will play the content) to be able to handle it.
To play such a media stream the client system will have a mechanism of obtaining the data, converting it to the required output format and presenting it to the user. This normally involves several steps. However, the currently installed mechanisms may not be suitable for various reasons:
• The content provider may want to change the way in which the media is processed. For instance, a frequency filter may be desired on an audio clip.
• To understand the content, a codec may be required that is not available on the client system.
• The content may be encrypted with a proprietary mechanism.
• The content provider may wish to add a specific form of visualisation to an audio clip.
Current media playing systems allow the downloading of additional codecs to allow them to play media in different formats.
This has many limitations:
• There is an additional step for the user to go through to install the codec for the media stream. This makes it harder for content providers to use many different codecs. • As the codec is executable content, the user will have to trust that the codec is safe to use. This is generally dealt with by various forms of cryptographic signing, but can be hard for the user to understand. This also leaves the user at risk of viruses or other malicious code.
• The codec will likely come from a third party rather than the content provider. Thus the content provider must rely on the security and availability of the systems of this third party.
• If the codec has a number of versions, then it is hard for the content provider to ensure that a suitable version is available on the client system.
• Even if a suitable codec is available, there is no way for the content provider to specify the configuration of how the media is to be played.
According to the present invention there is provided a method of extending media content by embedding within content to be streamed program code or a reference to such code which, when said code is run by a client device receiving the media content, enables the creation on the client device of a program module for use in rendering the content.
According to another aspect of the present invention there is provided an extendible media content rendering system comprising a client device arranged to receive media content which has embedded within it program code, or a reference to such code, the device using the code to enable the creator of a program module for use in rendering the content.
The invention further extends more generally to a computer program, to a data format, and to a datastream. It further extends to a data format and to a computer program when stored on computer-readable media. This invention provides a means of embedding extended signal processing code ("DSP code") into media content, where that content describes a network of media processing operations (i.e. it describes the topology of a "media streaming system"). The preferred solution has a Media Data Representation ("MDR") which contains both network topology information, and extended DSP code defined as pre-compiled, binary-portable code (expressed as translatable subclasses of pre-installed, fundamental system media classes that are known to be safe) which tells the device's media engine exactly how to create its own "DSP nodes" with which to render the content.
Signal processing code may be used to express the logic required to perform a set of DSP operations. Such operations are described as "DSP Units" (or "Units"), and each Unit might be used to:
generate data (a data "source", such as a tone generator in an audio rendering system, a sequence of meta event information such as MIDI commands, a graphical data stream in a screen-saver style media system, a unit used to deliver graphical data either from a data area defined within the media data itself or delivered dynamically through a stream)
- filter data (including DSP operations such as filtering/chorus/reverb in an audio rendering system such as a software synthesiser, or graphical effects unit in a more general graphics media presentation)
The core media system may manage the inter-communicating network of DSP units to take source data / filter source data / sink data in the manner of a traditional media streaming system (including MIDI or audio rendering systems) to deliver resultant data to the target device or host "player" software system. The core media system might also be used to feed-in external data to the stream (for example, externally supplied MIDI event data or externally supplied graphical file data).
The core media system may contain a "library" of pre-installed DSP Units expressed as classes for the intent system.
The invention may (but need not) be used in conjunction with Tao Group's intent system which incorporates a binary-portable operating system ("Elate") with an extensive multimedia system providing a means of executing multimedia applications on a wide-variety of hardware platforms. These applications are written in VP ("Virtual Processor"), a binary portable code.
The invention preferably includes the ability to embed sound synthesis code into music content of a score-type format (such as MIDI). The preferred solution has an audio data representation which contains both network layout information, and DSP code defined as pre-compiled, binary-portable classes which tells the device's embedded synthesis engine exactly how to create its own 'instruments' with which to render the content. The representation may be delivered (for example) within the SYStem Exclusive ("SYSEX") messages within standard MIDI files; this allows the basic audio content within the file to be rendered using legacy MIDI implementations; where the proposed engine is present on a device, that engine can be used to render the MIDI file according to the audio processing defined within the representation, and in which case the MIDI file would be rendered exactly as intended by the content author.
The representation may also contain network definition information on how to configure the underlying network of DSP modules to render the audio for each instrument within the composition. The DSP code may be delivered in binary- portable format, meaning that instrument definitions only need to be translated at load-time to be used by the engine. Because the translation process is very efficient, and because the resultant native code is very efficient, the solution can be used to render audio consistently on a wide range of devices, provided the proposed engine is present.
Note that the network definition need not be in the form of code; it could be represented in a hierarchical representatorial format which is interpreted by the engine — e.g. XML format within SYSEX messages in the MIDI file.
The instruments referenced by the network might be represented as either combinations of known "core" units which are pre-installed in the engine (in which case the representation does not have to contain new code for those units e.g. a default reverb unit, or a default square-wave based tone generator unit); or might be for combinations of completely new, arbitrary units which are defined in terms of class code which overrides fundamental engine classes (e.g. arbitrary tone generators, envelope generators, effects algorithms etc.); or any combination thereof.
The units are typically very flexible, with the ability to be reconfigured with parameterised information. For example, the same reverb unit could be used in two or more different ways from the same single code representation within one definition file; each instrument could use a different small parameter set to define how the "common" code would be used slightly differently in different contexts; this further reduces the size of the representation data.
Because pre-installed units can be leveraged, and because code can be used differently in different contexts within the same network by providing small parameterisation sets, and because the underlying code provides for compact representation of algorithms and class definitions, the representations can be very small even for complex sound representations.
The representation does not need to be compiled by the client, giving a very real performance benefit over approaches such as SAOL which require compilation into native code to deliver equivalent performance. Because the code is delivered as classes, which can extend on the fundamental classes provided by the underlying audio system, the class definitions can be very small indeed. For example, a class could be provided to override the operation of the underlying reverb implementation for the purposes of the rendition of a piece of music. Alternatively, the definition could include code that defines a number of new tone generators (e.g. particle synthesisers), which derive only from the fundamental underlying tone generator class.
The definition may be processed by the core synthesiser engine, which separates into both network topology and effect unit references. Any new code within the definition is extracted to working memory. A set of communicating class instances is constructed that reflects the network topology, whereby inter¬ chained effect units are constructed (in the manner of a normal pipelined DSP system) from the appropriate combination of either pre-installed code in the core engine, or code which is supplied in the definition. The engine then generates the audio required to render the description in the manner of a normal software synthesiser system (such as the SSEYO Koan Synth Engine). The system is also configured to be changeable in real-time due to environmental influences, such as MIDI controller changes (which units may be configured to arbitrarily receive and process if required), or even fundamental real-time changes to network topology and/or units to use. The core audio and graphics processing libraries for intent consist of classes and methods which execute basic audio and graphics processing capabilities. These typically include methods for playing a block (or sequence of blocks) of audio data, or for rendering graphics to a display area.
The invention may be carried into practice in a number of ways and one specific embodiment will now be described, by way of example, with reference to the accompanying drawings, in which:
Figure 1 shows a digital signal processing pipeline;
Figure 2 shows the preferred structure for data contained within the media stream;
Figure 3 illustrates how the preferred system of the present invention may be integrated with the Tao intent Midi Output Manager ("MOM");
Figure 4 is a block diagram illustrating the preferred mode of operation of the present invention; and
Figure 5 shows how Midi Event Data are processed in the preferred embodiment.
Figure 1 illustrates a conventional digital signal processing pipeline (sometimes called a network or a netlist) suitable for use with the present invention. A media datastream 10 is supplied to a splitter or de-multiplexer 12 which splits the datastream up into graphics and audio components. The graphic stream passes sequentially through a number of different object handlers or codexes
("Digital Signal Processing ("DSP") nodes") such as filters 14, 16 and a colour remapping unit 18. The output is then passed onto a graphics device 20. The audio stream, likewise, passes sequentially through a number of object handlers such as a pitch shifter 22, a reverb unit 24 and a filter 26. The output datastream then passes to an audio device 28.
The topology of the network is defined by means of a Media Data Representation ("MDR"), which may either be predefined and fixed or may alternatively be supplied in real time as part of the input stream 10. Thus, the network topology (for example the number of object handlers, their position in the sequence, the input parameters they need to operate and so on) need not be determined in advance but can be supplied and modified as required by the user.
When an application (not shown) makes a request to play a media file, it sends the network description to a stream manager (not shown) which then constructs a network on the basis of the MDR. The stream manager then signals that it is ready to receive the input datastream 10; the datastream can either be passed to it or it may proactively go and "grab" it itself. The datastream is then supplied to the object handlers in the appropriate order. This may either be done by the network itself, with each object calling the next one or the stream manager may retain control and may call the objects one by one as they are required.
The present invention allows the creator of the media steam not only to pre¬ define the topology of the network but also to modify that topology as the datastream is being consumed. This is achieved in the preferred embodiment by embedding the MDR directly in the stream. The stream may also include extended DSP code which tells the client device exactly how to create its own DSP nodes with which to render the content.
This is of particular interest where the media content is being consumed on an embedded device, such as a mobile phone which may have only a limited number of object handlers (DSP nodes). Provided that the client device has a certain minimum functionality, the media stream will be rendered consistently regardless of the client platform, and without any need for the user to download additional modules in advance, nor to wait while those modules are compiled. This flexibility also means, of course, that there is no longer a requirement for the client device to include a compiler.
The ability within the present invention to control in real-time the configuration of the network (and to send compiled code allowing new units to be generated on the fly) simplifies the requirements for media broadcasters. Rather than having to broadcast different media streams for different platforms, they can now broadcast just a single media stream which they can be sure will be rendered consistently by a wide variety of client devices (subject to a certain minimum level of functionality) regardless of the underlying client hardware.
Figure 2 shows the preferred structure for the MDR (Media Data Representation) within the datastream. As may be seen, the datastream includes information on the netlist configuration, which itself includes unit configuration information along with unit code. This is followed by information on the way in which the blocks are organised. After the netlist configuration information comes the content data. The netlist configuration information may be embedded as required in any place within the datastream, thereby allowing the network to be configured in real time as the content data is consumed on the client device. The unit configuration information may take the form of an XML description with embedded data or alternatively a compressed binary representation. The unit code will typically be in binary. For the avoidance of doubt, the netlist configuration may refer to any combination of pre-installed and dynamically defined DSP unit classes, that is any valid combination according to the semantics of the network. In one convenient embodiment the datastream may be defined by the standard MIDI format. This format allows META data to be included. In the preferred embodiment, the MDR is embedded within a SYSEX message. Of course, it will be understood that MIDI is only one type of datastream that is suitable for use with the present invention. Other types of data format, or indeed novel data formats, could also be used.
Where the system is used in conjunction with MIDI, the invention allows the content within the file to be rendered using legacy MIDI implementations.
Where the proposed engine is present on the device, that engine can be used to render the MIDI file according to the audio processing defined within the representation, so that the MIDI file will be rendered exactly as intended by the content author.
The DSP Units referenced by the Network Definition might be represented as either combinations of known "core" DSP Units which are pre-installed in the system (in which case the representation does not have to contain new code for those units e.g. a default reverb unit, or a default square-wave based tone generator unit, an opaque graphics filter); or might be for combinations of completely new, arbitrary units which are defined in terms of class code which overrides fundamental engine classes (e.g. arbitrary tone generators, envelope generators, effects algorithms, graphical filters etc.); or any combination thereof.
Where the invention is used in connection with audio, the network definition describes how to construct a network of inter-communicating DSP units that may be used to render the audio for each instrument within the composition. Where the invention is used in connection with streaming media content, generally, the network definition describes how to construct a network of media streaming DSP units that are used to render any combination of either data that is pre-supplied in the MDR, or alternatively data that is provided automatically (perhaps via the internet).
The new DSP Unit code that may be embedded within the Network Definition is delivered in binary-portable format, meaning that DSP unit definitions only need to be translated and if appropriate proof-verified at load-time to be used by the rendering engine. Because the translation process is very efficient, and because the resultant native code is very efficient, and because the code can be proof-carrying and therefore always known to be safe to run, the solution can be used to render both audio and graphics consistently on a wide range of devices, provided the proposed engine is present. The new DSP Unit code may only derive from known safe classes within the system; a system dictionary must be used to define which classes are safe to use, and which methods on those classes and which system tools are safe to invoke. Note that a new DSP Unit may derive from another new DSP Unit (etc. ...) provided that the base new DSP Unit properly subclasses a valid system DSP Unit.
The DSP Unit Code may either be supplied explicitly as binary data within the MDR, or it might be defined by reference (in which case, for example a URL could be used to retrieve the specified code). Note that versioning techniques might also be used. For example, the MDR could refer to code using an XML descriptor of the following form, where the URL could return ajar containing the code with the appropriate version update, that could get installed/cached on the device as appropriate depending on what version might (or might not) be installed already on the system. The description could also define optional dependencies on other system component versions. <?xml version="1.0"?> <vectormidisystem> <midilines>
<midiline line="1" module="1"> </midilines> <classes> <class id="/ave/iss/mom/unit/effect/reverb/myreverb" baseon="/ave/iss/mom/unit/effect/reverb"> version="1.23.4" src="http://mydomain/myeffect.cgi?com/tao_group/reverb&version=1.23.4"/> <unit t="reverb" c="1#1290,1. ;1#1290,1.;1#1293,1.7>
<notes>This class is based on the standard reverb class</notes> </class> </classes> <modules> <module id="1"> <Units>
<Unit id="3" class="/ave/iss/mom/unit/effect/reverb/myreverb"> <notes>This class is based on the standard reverb class</notes> <inputs> <input>
<fromunitid>1 </fromunitid> <parameter>O</parameter> </input> </inputs> <parameters> <parameter>
<parameterid>style</parameterid> <value>hall</value> </parameter> </parameters> </Unit> </Units> </Module> </modules> </vectormidisystem>
As an alternative to the code being proof-carrying, standard code-signing techniques could be used. Either the individual code elements could be signed, or the entire media file itself could be signed, this indicating a meta-signing of the entire contents of the file. Where the code is proof-carrying, it then of course simply needs to be verified and maintained within a system cache, without any need for separate verification of the code: the proof-carrying code is in itself sufficient for security.
The new DSP Unit code is always based on DSP Unit class code which is pre¬ defined within the system and know to be safe. Each DSP Unit might either extend quite simple base classes (e.g. simple tone generator) or it might extend quite sophisticated classes (e.g. complex reverb system). The system could also use component version dependency marking, to indicate dependencies on other code elements that might require dynamic updating of the underlying system software.
In the context of audio, the DSP Units might represent such units as Tone Generators, LFOs, Chorus, Reverb, DLS-based sample playback, etc.
Typically this code will be contained within a SYSEX of a MIDI data stream. For Media Streaming content the DSP Units might represent decryption modules, codecs, filters or visualisation units.
DSP Units are typically very flexible, with the ability to be reconfigured with parameterised information that can either be supplied within the MDR or even be generated dynamically within the network itself.
For audio, the same reverb unit could be used in two or more different ways from the same single code representation within one definition file; each instrument could use a different small parameter set to define how the "common" code could be used slightly differently in different contexts; this further reduces the size of the representation within the MDR. For Media Streaming content, visualisation units may have parameters to modify their look, decryption units may have encryption keys provided, and codecs may have quality parameters.
Because pre-installed unit class code is always leveraged to a greater or lesser extent, and because code can be used differently in different contexts within the same network by providing small parameterisation sets, and because the underlying code provides for compact representation of algorithms and class definitions, the representations can be very small even for complex sound representations or complex graphical events.
For audio, the same reverb unit could be used in two or more different ways. For example, a class could be provided to override the operation of the underlying reverb implementation for the purposes of the rendition of a piece of music. Alternatively, the definition could include code that defines a number of new tone generators (e.g. particle synthesisers), which derive only from the fundamental underlying tone generator class. For Media Streaming content, the base unit class code for a filter would handle the basic data transfer, and the additional code would just manipulate the data.
It will be understood of course that the datastream will in many practical embodiments be supplied across a network, for example, across a wireless network in the context of mobile phones. However, this is not essential and the invention is equally applicable to a datastream which is supplied locally, for example from a file on a hard-drive, on a CD or on a DVD.
We will now turn to the way in which the system of the present invention may be integrated within current systems supplied by Tao Group Limited, 62-63 Suttons Business Park, Early, Reading, Berkshire, RG6 IAZ, United Kingdom. In this particular implementation, program code within the media content is delivered as Virtual Processor ("VP") -represented binary code. It is of course to be understood that the client devices should have suitable platform such as Tao's intent ™ System, allowing them to read such binary. It goes without saying that the invention is not limited to that particular implementation: instead, the code could be supplied in any convenient widely-understood language such as Java.
Figure 3 illustrates how various applications might use the preferred embodiment (known as Vector Softsynth) within the context of the Tao intent Midi Output Manager ("MOM") system residing on a device. Each application on the device uses the services of the general MOM framework to route data through to a target MIDI rendering system. There are various potential rendering systems, including MIDI hardware based rendering (which sends data directly to the MIDI hardware device driver "/dev/midi"), and traditional MIDI software synthesizer using sample-based data. The proposed new component is termed the "Vector Softsynth" (or "Extendible Modular Synthesizer"), which renders MIDI data using the proposed functionality. Audio data from all soft synths is delivered to one of a variety of potential destinations (depending on the device configuration).
Figure 4 shows how a class-based hierarchy can be used to deploy the Vector Softsynth within the above system. There is a hierarchy of classes, starting from the external API class ave/iss/mom/class, which is used by client applications to communicate with instances of a MOM server process. This API class is used to create an instance of the appropriate renderer class using a class factory to create the correct class type to match the rendering requirements of the application (this class might, or might not, be in a separate process depending on the needs of the application). The base class for the renderer class is ave/iss/mom/r/mom/class. For our proposed solution, the ave/iss/mom/r/soft/class software synth specialisation is further specialised by the Vector softsynth specialisation class which might for example be /ave/iss/mom/r/soft/imp/com/tao-group/vector/class.
Once the MOM renderer has been created, then the application can send MIDI data to the Vector softsynth class via the MOM framework using standard methods within the MOM client API. In order to create a vector softsynth that would implement a solution meeting the requirements of this patent, the Vector softsynth must implement support for standard MOM class methods (including set-format and calculate), to respond to MIDI data within the file in order to respond to audio block rendering requests within the system (calculate method):
In response to requests for the system to render an audio data block, the system must check the system MIDI command queue for commands matching the audio block timestamp:
• Create and modify the synth unit network as required:
When a SYSEX command is found which represents a network definition, the synth network must be reconstructed dynamically in accordance with the MDR within the MIDI command. This requires the MDR to be parsed, the network of intercommunicating classes to be constructed (with proof verification of dynamically defined classes being performed as required).
• Manage rendering of MIDI event data through that network:
When a MIDI command is found which can influence the state of the rendering system (e.g. note on/off commands), those commands must be supplied to all units within the network that might be affected (e.g. tone generators) such that those units can respond to the MIDI commands by changing state or internal properties (as appropriate).
• Pump rendered audio data through the network to the destination (some candidate destinations are shown, including directly to an audio device, to a streaming API5 or to an audio mixer API):
In response to requests for the system to render an audio data block, the system must mix the requested block of audio. This is performed by calling the calculate method of each unit within the system in strict left-to-right processing priority order. Tone generators will prepare audio data in their output audio block in accordance with internal state reflecting recent note on/off events and other properties (and may perhaps refer to audio sample data with the system); filters will prepare audio data in their output audio block in accordance with source audio data blocks from other units, and other properties; various unit outputs (e.g. from LFOs) may affect arbitrary parameters within other units in the system. All units at the end of the system which are marked as being "output" units, have their final outputs mixed together to create the final audio block output for the synthesizer.
Figure 5 illustrates how the Vector Softsynth processes MIDI event data. As MIDI event data enters the system, the system must decide what to do with each item of data. If data is a network description SYSEX command, then this information is used to create or modify the network of intercommunicating DSP units. If however the data is not a network description SYSEX command, then the command is put in a timestamped MIDI event queue from which it is later absorbed as required to process audio blocks required for the output system. Audio blocks are rendered using the DSP network.
Step "2." in the flowchart box of figure 5, requires the system to process a Vector network description command which is contained within the MIDI system exclusive message. There are many potential formats for such command data, including compressed binary representations of XML. An example representation is given in the table below. This represents a description of how to render an entire MIDI network. Note that the DSP code elements might either be embedded within the MDR itself, or could be referred to via a URL with optional version numbering and optional dependencies on other system component versions.
The optional midilines section details which DSP module description is to be used to render each MIDI line (note that if not present, a midiline is assumed to be rendered by the first listed module). The optional classes section details new classes that are represented within the description, which are always based on classes that exist within the pre-installed system. The mandatory modules section may contain one or more different module entries. Each module entry describes how sounds are to be generated for that module. The information present within the module definition, contains full and complete information for the DSP units that are present, how they are to be interconnected, and what parameter settings are for each effect unit within the module. Note that where parameter values are not defined, they are assumed to be the default values that are defined within the effect unit class code. Note that outputs from some modules are capable of being fed as inputs to other modules, and also used to modulate parameters of any other module. Parameters may be assigned to respond dynamically to arbitrary incoming MIDI controller values (e.g. RPN, NRPN or SYSEX) that are present elsewhere in the MIDI file. Note that if the command contains new DSP unit code in VP, then the code is first verified for safe operation, and rejected if not deemed safe. This ensures that the system is safely extendible, and the user does not have to worry about playing content on the device that can be extended using the suggested format. Note that DSP unit code must always be derived from one of the existing classes in the hierarchy, including for example
/ave/iss/mom/vector/reverb/class or /ave/iss/mom/vector/tonegenerator/class or /ave/iss/mom/vector/tonegenerator/sine/class.
<?xml version="1.0"?> <vectormidisystem> <midilines>
<midiline line="1" module="1"> <midiline line="2" module="1"> <midiline line="3" module="1"> <midiline line="4" module="1"> <midiline line="5" module="1"> <midiline line="6" module="1"> <midiline line="7" module="1"> <midiline line="8" module="1"> <midiline line="9" module="1"> <midiline line="10" module="1"> <midiline line="11" module="1"> <midiline line="12" module="1"> <midiline line="13" module="1"> <midiline line="14" module="1"> <midiline line="15" module="1"> <midiline line="16" module="1"> </midilines> <classes>
<class id="/ave/iss/mom/unit/effect/reverb/myreverb" baseon="/ave/iss/iτiom/unit/effect/reverb"> <notes>This class is based on the standard reverb class</notes> <vp>BINARY CODE GOES HERE IN SOME FORMAT</vp> </class> </classes> <modules> <module id="1"> <Units>
<Unit id="1" class=7ave/iss/mom/unit/tonegenerator/sine"> <parameters> <parameter>
<parameterid>Level</parameterid> <value>1</value> </parameter> </parameters> </Unit>
<Unit id="2" class="/ave/iss/mom/unit/effect/lfo"> <parameters> <parameter>
<parameterid>shape</parameterid> <value>square</value> </parameter> <parameter>
<parameterid>freqhz</parameterid> <value>200</value> </parameter> <parameter>
<parameterid>level</parameterid> <value>0.388</value> </parameter> <parameter>
<parameterid>ratio</parameterid> <midicontroller type="rpn" id="64" scaling="0.47"></value> </parameter> </parameters> </Unit>
<Unit id="3" class=7ave/iss/morn/unit/effect/reverb/myreverb''> <notes>This class is based on the standard reverb class</notes> <inputs> <input>
<fromunitid>1 </fromunitid> <parameter>O</parameter> </input> </inputs> <parameters> <parameter>
<parameterid>style</parameterid> <value>hall</value> </parameter> </parameters> </Unit> </Units> </Module> </modules> </vectormidisystem>
AU of this goes into a SYSEX command in MIDI. To reiterate, in response to other commands in the MIDI command stream, there might be commands which supply new parameter values (e.g. using MIDI NRPN, RPN or SYSEX controller messages). The Vector synthesizer can interpret this information dynamically to respond to external MIDI controllers.
Effects units may contain any arbitrary digital signal processing logic. They may include tone generators, effects units such as reverberation system, low- frequency oscillators, envelopes and many other possibilities. DSP Units may also be capable of performing higher level operations that would benefit from the high-performance binary-portable translatable code, such as automatic composition or automatic harmonisation.
Signal processing in the system may be described as follows. Note that to simplify logic, when the network topology is constructed, units should be placed within the topology such that they are processed from in strict priority order.
In response to a request to render an audio block:
For every module present in the system:
For each control-rate sized block in the audio block (control rate is e.g. 100Hz)
For every unit in the module in strict priority order:
Call the "calculate" method on each unit. This will have a different effect depending on the nature of the module. For example, a tone generator "calculate" method will cause it to consume note on/off events in the command queue for that module; and render to an output buffer the generated tone. For units such as reverb units, these will take input from the output stages from units which feed within the network topology, and populate an output buffer based on the input audio data. The final unit in the module is the one which provides output to the audio block (which is mixed with audio data produced by all other modules in the system). Note that to have more than one unit contribute to the final output, an adder unit may be used as the final unit in the module.

Claims

Claims
1. A method of extending media content by embedding within content to be streamed program code or a reference to such code which, when said code is run by a client device receiving the media content, enables the creation on the client device of a program module for use in rendering the content.
2. A method as claimed in claim 1 in which the program code is binary code.
3. A method of extending media content as claimed in claim 1 in which the program code is binary portable code for running on a hardware- independent rendering engine on the client device.
4. A method of extending media content as claimed in any one of the preceding claims in which the program code is arranged to enable the creation, when run on a client device, of a new node in a digital signal processing network.
5. A method of extending media content as claimed in any one of the preceding claims in which the program code is proof-carrying.
6. A method of extending media content as claimed in claims 1 to 5 in which the program code is digitally signed.
7. A method of extending media content as claimed in any one of the preceding claims in which the program code itself embedded within the media content.
8. A method of extending media content as claimed in any one of claims 1 to 6 in which the media content has embedded within it a link (e.g. a URL) to the program code to enable the client device to obtain the code as it received the media content.
9. A method of extending media content as claimed in any one of the preceding claims in which the program code includes component version dependency marking.
10. A method of extending media content as claimed in any one of the preceding claims in which the media content is defined according to the MIDI standard.
11. A method of extending media content as claimed in claim 10 in which the program code or the reference to the program code is contained within a MIDI SYSEX message.
12. A method of extending media content as claimed in any one of the preceding claims including further embedding within the media content to be streamed network definition information, or a reference to such information, which defines for the client device characteristics of a or the digital signal processing network for rendering the content.
13. A method of extending media content as claimed in claim 12 in which the network definition information comprises data, not program code.
14. A method of extending media content as claimed in claim 10 or any one of claims 11 to 13 when dependent upon claim 10 in which the network definition information, or the reference to such information, is also included in a MIDI SYSEX message.
15. A method of extending media content as claimed in claim 14 in which the program code or reference, and the network definition information or reference, are contained within a common SYSEX message.
I,
16. An extendible media content rendering system comprising a client device arranged to receive media content which has embedded within it program code, or a reference to such code, the device using the code to enable the creation of a program module for use in rendering the content.
17. A system as claimed in claim 16 in which the program code is binary code.
18. A system as claimed in claim 16 in which the program code in binary- portable code, the said code being arranged to run on a standard hardware-independent rendering engine on the client device.
19. A system as claimed in any one of claims 16 to 18 in which the running of the program code on the client device enables the creation of a new node in a digital signal processing network.
20. A system as claimed in any one of claims 16 to 19 in which the program code is proof-carrying, and in which the client device carries out a proof validation check on receipt.
21. A system as claimed in any one of claims 16 to 20 in which the program code is digitally signed and in which the client device checks the signature on receipt.
22. A system as claimed in any one of claims 16 to 21 in which the program code is itself embedded within the media content.
23. A system as claimed in any one of claims 16 to 21 in which the media content has embedded within it a link (e.g. a URL) to the program code, the client device using the link to obtain the code from a remote location.
24. A system as claimed in any one of claims 16 to 23 in which the program code includes component version dependency marking.
25. A system as claimed in any one of claims 16 to 24 in which the media content is defined according to the MIDI standard.
26. A system as claimed in claim 25 in which the program code or the reference to the program code is contained within a MIDI SYSEX message.
27. A system as claimed in any one of claims 16 to 26 in which further embedded within the media content is network definition information, or a reference to such information, the client device being arranged to use the said information to determine the characteristics of a or the digital processing network used for rendering the content.
28. A system as claimed in claim 27 in which the network definition information comprises data, not program code.
29. A system as claimed in claim 25 or any of claims 26 to 28 when dependent upon claim 25 in which the network definition information, or the reference to such information, is also included in a MIDI SYSEX message.
30. A system as claimed in claim 29 in which the program code reference, and the network definition information or reference, are contained within a common SYSEX message.
31. Media content having embedded within it program code or a reference to such code which, when said code is run by a client device receiving the media content, enables the creation on the client device of a program module for use in rendering the content.
32. A computer-readable medium on which is stored media content as claimed in claim 31.
33. An electronic datastream representative of media content as claimed in claim 31.
34. A computer program which, when run on a digital computer, implements a method as claimed in any one of claims 1 to 15.
35. A method of rendering extendible media content, comprising receiving media content which has embedded within it program code, or a reference to such code, executing the code, using the executed code in the creation of a rendering program module, and rendering the content using the rendering program module.
PCT/GB2005/003487 2004-09-10 2005-09-09 Extendible media content rendering system WO2006027605A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0420180.2 2004-09-10
GB0420180A GB0420180D0 (en) 2004-09-10 2004-09-10 Extendible media content rendering system

Publications (2)

Publication Number Publication Date
WO2006027605A2 true WO2006027605A2 (en) 2006-03-16
WO2006027605A3 WO2006027605A3 (en) 2007-05-10

Family

ID=33186849

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2005/003487 WO2006027605A2 (en) 2004-09-10 2005-09-09 Extendible media content rendering system

Country Status (3)

Country Link
CN (1) CN101103334A (en)
GB (1) GB0420180D0 (en)
WO (1) WO2006027605A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2083422A1 (en) 2008-01-28 2009-07-29 Fortium Technologies Ltd. Media modelling
WO2010088131A1 (en) * 2009-01-29 2010-08-05 Qualcomm Incorporated Dynamically provisioning a device with audio processing capability
CN111566724A (en) * 2017-12-18 2020-08-21 字节跳动有限公司 Modular automatic music production server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020141741A1 (en) * 2001-03-29 2002-10-03 Han Zou Universal multimedia optic disc player and its application for revocable copy protection
US20020188841A1 (en) * 1995-07-27 2002-12-12 Jones Kevin C. Digital asset management and linking media signals with related data using watermarks
US20040103207A1 (en) * 2002-11-22 2004-05-27 Elman Joshua E Method and apparatus for distributing binary presentations within digital media content files
EP1427170A2 (en) * 2002-12-02 2004-06-09 Microsoft Corporation Peer-to-peer content broadcast mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020188841A1 (en) * 1995-07-27 2002-12-12 Jones Kevin C. Digital asset management and linking media signals with related data using watermarks
US20020141741A1 (en) * 2001-03-29 2002-10-03 Han Zou Universal multimedia optic disc player and its application for revocable copy protection
US20040103207A1 (en) * 2002-11-22 2004-05-27 Elman Joshua E Method and apparatus for distributing binary presentations within digital media content files
EP1427170A2 (en) * 2002-12-02 2004-06-09 Microsoft Corporation Peer-to-peer content broadcast mechanism

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2083422A1 (en) 2008-01-28 2009-07-29 Fortium Technologies Ltd. Media modelling
WO2010088131A1 (en) * 2009-01-29 2010-08-05 Qualcomm Incorporated Dynamically provisioning a device with audio processing capability
US8532714B2 (en) 2009-01-29 2013-09-10 Qualcomm Incorporated Dynamically provisioning a device with audio processing capability
US8805454B2 (en) 2009-01-29 2014-08-12 Qualcomm Incorporated Dynamically provisioning a device
CN111566724A (en) * 2017-12-18 2020-08-21 字节跳动有限公司 Modular automatic music production server
US11610568B2 (en) * 2017-12-18 2023-03-21 Bytedance Inc. Modular automated music production server

Also Published As

Publication number Publication date
GB0420180D0 (en) 2004-10-13
CN101103334A (en) 2008-01-09
WO2006027605A3 (en) 2007-05-10

Similar Documents

Publication Publication Date Title
JP3770616B2 (en) Object-oriented video system
US7126051B2 (en) Audio wave data playback in an audio generation system
US20020143413A1 (en) Audio generation system manager
US20020161462A1 (en) Scripting solution for interactive audio generation
US20020143547A1 (en) Accessing audio processing components in an audio generation system
JPH09502821A (en) Object-oriented audio system
JPH1173182A (en) System for forming, distributing, storing and executing music work file and method therefor
JPH09503070A (en) Object-oriented MIDI system
JPH09503321A (en) Multimedia player component object system
US5902947A (en) System and method for arranging and invoking music event processors
JP2001519072A (en) Sound authoring system and method
Didkovsky et al. Maxscore: Music notation in max/msp
Pope Machine tongues XV: Three packages for software sound synthesis
Goyal Pro Java ME MMAPI: mobile media API for java micro edition
WO2006027605A2 (en) Extendible media content rendering system
US7386356B2 (en) Dynamic audio buffer creation
Hermann et al. Sc3nb: A Python-SuperCollider Interface for Auditory Data Science
WO2022143530A1 (en) Audio processing method and apparatus, computer device, and storage medium
CN104269185A (en) Method and system for realizing sound mixing play in Java virtual machine
KR101468411B1 (en) Apparatus for playing and editing MIDI music and Method for the same with user orientation
US7089068B2 (en) Synthesizer multi-bus component
Kleimola Design and implementation of a software sound synthesizer
Wyse A sound modeling and synthesis system designed for maximum usability
GB2430854A (en) Control of music processing
Dorigatti et al. DESIGNING A LIBRARY FOR GENERATIVE AUDIO IN UNITY

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 200580030123.9

Country of ref document: CN

NENP Non-entry into the national phase in:

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase