GB2532034A

GB2532034A - A 3D visual-audio data comprehension method

Info

Publication number: GB2532034A
Application number: GB1419740.4A
Authority: GB
Inventors: Lee Smiles Aaron
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-11-05
Filing date: 2014-11-05
Publication date: 2016-05-11
Also published as: GB201419740D0; WO2016071697A1

Abstract

A visual display (on eg. a touch screen device) allows visual data elements (eg. 3D shapes or voxels, fig. 11) representing objects, eg. waveforms (fig. 12) or musical phrases (fig. 16), to be manipulated via a users gestures (eg. screen swipes) in order to modify the data element for aural output. Figures 2 & 3 show the large sphere being made louder (ie. coming closer to a listener) and three smaller spheres becoming quieter (more distant). The data elements may form part of a soundstage for live performance.

Description

DATA COMPREHENSION METHOD

The present invention relates to a data comprehension method comprising a method of inputting data in a first format and outputting the data aurally. In particular, the invention relates to a method and practice of electronic sound/physical modelling synthesis; whereby the sound is synthesised by interlinked visual and physical variables, which may or may not be in 3-dimensions (3-D), and may or may not be in direct relationship with 3D sound modelling via a 3D graphical user interface (GUI).

One type of musical performance involves playing back recorded music and sounds in a new "mix." The performer is commonly called a disk jockey or DJ. A disk jockey relies on many tools to create a musical performance. For example, vinyl records may be manipulated on multiple turntables, or multiple compact disks (CDs) may be manipulated, to create a variety of sound effects. The disk jockey may vary the speed, pitch, direction, and sequence of playback and may apply a variety of effects to the played back music.

More recently disk jockeys have been using tools that manipulate audio data stored in data files on computers. Some of these tools have been developed based on the idea of computerizing what is usually done with turntables and vinyl records and CD players. However, the flexibility of the computer enables new tools and techniques to be explored for use in musical performances.

Another type of musical performance involves composing, synthesising, sequencing, or ordering, individual digital/electronic or recorded musical parts (varying in length from single beats to entire pieces of music), over time, in a live presentation (set, show, concert, event, etc.). The performer is commonly called an Electronic Dance Music artist (EDM artist). An EDM artist has access to all the same control parameters available to DJs, with more freedom and control over the creation of the individual tracks being played, known as electronic compositions.

Another type of musical performance involves individual musical parts being performed by musicians playing (commonly traditional) musical instruments by following transcribed sheet music. The sheet music contains musical information known as a composition, which is written by a composer. The performance of a composition is led and directed by, what is commonly called, a conductor. In the context of an entire presentation, a DJ, EDM artist, or conductor makes decisions on the sequence of pieces of music over the course of the show. In this context, the artistic value is more relative to curation, than creation.

In the creation of an individual musical piece, a composer transcribes musical parts, which are scored to represent where/when they will be played in the overall musical piece. Each piece (or section) is given a tempo, time signature, and a musical key. Each part is written in sync with the overall tempo, time signature, and key of the piece, so that each individual part fits with the rhythmic, harmonic, and melodic structure of the overall piece. In a classical performance context, the transcript of the compositional parts would be distributed to each of the separate instrumentalists, and a conductor would instruct each instrumentalist where to sit (soundstaging), when and how to play, and when to stop. Each conductor adds his own personal artistry to each composition. In the practices of live popular music, the role of the conductor is omitted. The composition and parts are rehearsed and usually performed, or improvised, without formal direction or use of sheet music (transcripts). In the practices of electronic music production, the separate instrumental parts are arranged on a timeline sequence displayed via a graphics engine, usually on a 2-dimensional graphical user interface. This sequence is similar to classical sheet music. The sequence is first given a set tempo and time signature, and then parts are added to the timeline, usually in individual rows, to compose a piece of music. This piece is usually exported as a digital media file, which is then presented, usually by a DJ, as a piece of recorded music. There is no conductor involvement. In the practices of live electronic production, parts are arranged in 2- dimensional rows and columns. However, an overall piece of music is not pre-composed. Individual parts are composed, but not pre-sequenced to complete an entire piece of music. The parts are stored as individual loops, which are automatically analysed and processed to match any tempo and time signature set by the composer. These loops are sequenced in real-time during a presentation of music. This is similar to a conductor of music, without the physical placement/positioning of instrumental parts on stage (known as soundstaging).

Soundstaging is the act of creating, or creating the perception, of a soundstage in the presentation of a piece, or pieces, of music. In a traditional context, soundstaging is the arrangement of instrumentalists on stage, in front of a conductor, and/or audience, and how the audience perceives the arrangement. This perception is usually created by the physical placement of the instruments i.e. left or right, rows back or forth, and sometimes height or depth (if they are situated in the orchestra pit). Modern technology allows more control over the delivery of the soundstage via systems comprising of microphones, mixing desks and loudspeakers. Using individual volume, equalisation, and panning controls, as well as, special sound effects (such as Reverb), these systems allow for live control of the overall mix of the sound, as perceived by the audience.

A similar principle is applied to recorded music and electronic music production, where tracks are mixed down to project the desired soundstage to the listener. For example, for a pop track, the vocalist will be at the front/centre of the soundstage, whereas, for a house/techno track, the kick drum will be at the front/centre of the soundstage.

In comparison to traditional soundstages, mixing the soundstage of a track using modern technology is mechanical and linear, and less intuitive to both the mix engineer and recipient(s) of the sound(s), due to the use of faders and knobs to control specific parameters, as opposed to the physical placement of instrumentalists.

The creation of sound can be organic or synthetic. Organic sound production involves traditional instruments made from organic material i.e. wood and metal. Synthetic sound production involves electronic instruments, or synthesisers, made from electronic components and computer processors. Sounds produced by traditional musical instruments are embodied by the shape, function, and material of the instrumental object, and are recognised equally by their visual form.

Electronic synthesis can reform the previous proposition, but limitations are defined by current music theories, and although sampling is used to add variation, digitally processed sound is not embodied or connected to any physical context. Users with less information, prior knowledge, and experience, create sounds that lack musical refinement due to the required knowledge transfer being unavailable.

Such a knowledge transfer is given with a traditional background in music theory, which most audio platforms are fundamentally based on. The traditional approach affords audible and visual refinement of sound, providing multi-dimensional feedback, so that an audience might also identify and relate to the music being produced.

Currently available software might require a level of knowledge that would off put the general audience; this might be, for example, developing an understanding of the GUI, reduced sensory feedback, unintuitive and arbitrary function, and lack of natural context. Current technology restricts output through alienation and exclusivity, for example, 2-dimensional motion input devices (i.e. computer mouse), text entry interface (computer keyboard), arbitrary 2-dimension GUIs, midi keyboard controllers, or even keyboards on a synthesis platform -there are variables on the latter, but these are not all that fluid.

Physical modelling synthesis realises a degree of natural context by providing sets of real-world variables, such as, material of instrument to synthesis, and type of gesture used to strike said instrument. This data then translates into relative synthesised sound. This method is limited to predefined scenarios and known sounds, and lacks the provision for synthesising original complex frequencies/sounds.

In addition, current technologies are limited in the type of data they can use as an input for mixing and manipulating sound outputs, and rely heavily on using sound inputs. Other types of inputs are just not envisaged, and therefore data outputs are limited to being aural, and merely being variations of the sounds input to such a system.

The present invention aims to address these problems by providing a data comprehension method, comprising of inputting at least a first data element; displaying the first data element in a visual format; Manipulating the first data element in the visual format using pre-defined gestures to create a modified data element; and outputting the modified data element aurally.

One advantage of the present invention is to be able to be virtually present within the audio mix visually, and navigate around the mix in real-time, at a real-world scale.

Preferably the method further comprises the steps of assigning a first object to the first data element; displaying the first data element visually in the form of the first object, such that the pre-defined gestures manipulate the object; and assigning an aural component to the first object.

Preferably, when more than a first data element is input, assigning an object to each data element and assigning an aural component to each object.

Preferably, when the object is manipulated, the aural component varies in accordance with the manipulation.

Preferably, the first data element represents a physical, optical, tactile, odorous or aural variable.

Preferably, the modified data element is output aurally via at least one sound emitting device.

Preferably, the first data element represents a sound, and the modified data element forms part of a soundstage.

The visual format may be in 3-D, in which case the gestures are enacted in 3-D.

Alternatively, the visual format is in 2-D.

Preferably, the visual format is displayed on a touch-sensitive screen device. Alternatively, the visual format is displayed on a screen, and the first data element is manipulated via a remote touch sensitive device.

Preferably, the object is rendered in the form of voxels, and the gesture influences the voxel nodes.

The present invention also provides a computer program product adapted to perform the above method.

The present invention also provides a computer-readable storage medium or data carrier comprising the computer program.

The present invention will now be described by way of example only, and with reference to the accompanying drawings, in which: Figure 1 is a flow chart illustrating the method of the present invention; Figure 2 illustrates how the object position along the Z-axis determines the intensity (loudness) of the sound output; Figure 3 illustrates how the object position along the X-axis determines the stereo width (panning) left and right of the sound output; Figure 4 shows the outer side partitions of the 3D environment used to isolate object(s) within a partitioned area to a specific sound output; Figure 5 illustrates grouped together soundstages, with objects set within each environment and the current active soundstage in the centre; Figure 6 illustrates map navigation and how the current active soundstage changes dependent on location data; Figure 7 illustrates an active soundstage can be anywhere on the map; Figure 8a shows waveforms A and B in phase; Figure 8b shows waveform B half a cycle forward of waveform A; Figure 8c shows waveform B half a cycle behind waveform A; Figure 8d shows waveforms A and B completely out of phase; Figure 9a shows a sine wave waveform cycle; Figure 9b shows a square wave waveform cycle; Figure 9c shows a triangle wave waveform cycle; Figure 9d shows a sawtooth wave waveform cycle; Figure 10a shows a 3D spherical object; Figure 10b shows a 3D cubic object; Figure 10c shows a 3D conal object; Figure 10d shows a 3D pyramidal object; Figure 11a shows a cube that has almost been transformed into a sphere; Figure 11b shows a cube that has been rounded at the edges; Figure 11c shows a cube that has been indented on each of the sides from the centre; Figure 11d shows a cube that has been fully rounded on all vertical sides to form a cylinder; Figure 11e shows a cube that has been tilted to one side to form a parallelogram shape; Figure 11f shows a cube that has had one side tilted to form a trapezium shape prism; Figure 11g shows a cube that has been formed into a hexagonal prism; Figure 1111 shows a cube that has been formed into a multi-sided diamond-like shape; Figure 11i shows a cube that has been rounded at the edges and corrugated on all sides; Figure 11j shows a cube that has been formed into a curved pyramid; Figure 11k shows a cube that has been formed into an indented cone; Figure 12a shows an equally combined sine waveform and square waveform; Figure 12 shows a square waveform with rounded corners; Figure 12c shows an indented square wave; Figure 12d shows a square wave with a curved top; Figure 12e shows a square wave that is ramped at the start; Figure 12f shows a square wave that is sloped to end; Figure 12 g shows a square wave that is ramped at start and end; Figure 12h shows a square wave that has been ramped at the start and sloped to end; Figure 12i shows a square wave with ripples at the peak, which may produce a damped oscillation; Figure 12j shows a square wave that has a ramped curve at the start to peak and a sloped end; Figure 12k shows a square wave that has a ramped curve from the start to the end; Figure 13 shows two different complex objects; Figure 14 shows an object and how an object's [314] dimensions relate to the Cartesian coordinate system; Figure 15 shows different combinations of cross-section slices of different object surface layer textures; Figure 16a shows a monophonic texture pattern traditionally notated for sheet music; Figure 16b shows a heterophonic texture pattern traditionally notated for sheet music; Figure 16c shows a homophonic texture pattern traditionally notated for sheet music; Figure 16d shows a homorhythmic texture pattern traditionally notated for sheet music; Figure 16e shows a polyphonic texture pattern traditionally notated for sheet music; Figure 17 shows a cross-section slice of an object, which displays each of the layers within the object and their relative distances between each layer; Figure 18 shows a standard mobile media device, including built in camera, microphone, and light sensor; Figure 19 shows a diagram of the magnetic interaction relationship between objects; and Figure 20 shows a diagram of an example speaker array completely surrounding a dance floor from all angles/directions.

The present invention utilises the input of at least a first data element, for example, a colour or a sound, and displays this first data element in a visual format, for example, as an image on the screen of a smart phone. This visual display allows the manipulation of the first data element, for example, by pinching or swiping the image, using pre-defined gestures to create a modified data element. For example, a sound may be displayed visually as a coloured square. The square may be manipulated by changing its size, orientation, or the brightness or contrast of its colour. Once the modification is complete, the modified data element is output aurally. This may be instantaneous or sequentially. This may be through a loudspeaker, a pail-of headphones or other sound device. The step of displaying the sound as a coloured square involves assigning a first object to the first data element, so that the first data element is displayed visually in the form of the first object. This first object has a specific aural component assigned to it. For example, a yellow square will always output as the same tone.

When more than a first data element is input, an object is assigned to each data element, and an aural component is assigned to each object. However, many data elements may be assigned to any one object, if desired. When the object is manipulated, the aural component assigned to that object varies in accordance with the manipulation. For example, increasing the size of a yellow square increases the volume of the assigned tone. The modified data element is output aurally via at least one sound outputting device.

The first data element may represent any desirable quantity, for example, a physical, optical, tactile, odorous or aural variable. However, the first data element may represent a sound, and the modified data element forms part of or is used within the soundstage.

The visual format may be in 3-D, such that the gestures are enacted in 3-D. This greatly enhances the manipulation possibilities, and creates a composition environment in 3-D. Alternatively, the visual format is in 2-D, for example, displayed on a touch-sensitive screen if the visual format is displayed on a screen, and the first data element could be manipulated via a remote touch sensitive device.

The generated and output sound may also be presented in 3D, commonly referred to as ambisonic.

Typical gestures used in the control of mobile smart devices can be assigned to data, and used to manipulate data to produce a variety of sounds. For this to happen, data is rendered in the form of voxels, and the gesture influences the voxel nodes. Rendering sounds (and other data) as voxels creates objects that can be manipulated. The data and application flow required to allow the manipulation of the voxel nodes is shown in Figure 1.

Figure 1 is a flow chart illustrating the method of the present invention. In the flow chart, solid arrows represent the flow within the application, and hollow arrows represent the flow of data. To begin with, a user decides to run the application 1001, and accesses a Start menu 1002. At this point, the user chooses between a Start File Menu 1004 and a Start Settings Menu 1006, depending upon whether it is desired to start an AV (audio visual) project or create settings for a project. If the Start File Menu is chosen, the user can chose to either construct a new project 1008 or load an existing project 1010. If constructing a new project, an AV project is assigned at 1012, checking current AV project memory data at 1014 and loading the main menu to initialise the current project at 1016. If an existing project is chosen, this is loaded at step 1018 from a search of filenames in the hard disk 1020, the data deserialised at 1022 and fed back to step 1012 where the AV project is assigned, and steps 1014 and 1016 repeated. Once the Main Menu is loaded at step 1024 this can be used to select tools relating to the data display environment at step 1026, sound objects at step 1028 as well as the ingame menu at step 1030 and the settings menu at step 1032. The ingame menu 1030 enables the user to reach the file menu, to save or create new AV projects once sounds have been manipulated. The environment tools accessed at step 1026 can be used, for example, to focus gestures and therefore movements into a brush area on the surface or in the region where gestures will be made. By representing the data at this point in the form of voxels, it is possible to make the voxel nodes appear more solid (at step 1034), less solid (at step 1036) or smoother (at step 1038). Object tools can be used to create new tools or vary the effect a gesture has on a particular display, which can be saved relative to individual projects (at step 1040) or in general (at step 1042). When outputting data aurally in the form of sounds or tones, it may be that a certain amount of reverb is applied (step 1044), or other effects. These steps and variations thereof will become more apparent from the many detailed examples below. Initially the invention will be described in terms of the input data being sound, and the environment in which the sounds are manipulated being represented by soundstages.

Figure 2 shows how object positioning along the Z-axis determines the intensity (loudness) of the sound output. This is a key manner in which the output of the data aurally is managed. Intensity may or may not also be presented as volume, gain, loudness, magnitude, sound pressure level, velocity, and any other relative sound level variable.

Figure 3 shows object positioning along the X-axis, which determines the stereo width (panning) left and right of the sound output. Again, this can be varied to give a wider range of aural data output formats. Stereo height (vertical panning) is applied to object placement along the Y-axis. The Y-axis could or could not also be applied to sound equalisation (EQ) -position coordinates above the centre line apply higher EQ values, whilst position coordinates below the centre line apply lower EQ values. The object size determines how much of an EQ bandwidth is applied i.e. a larger object would apply a larger EQ bandwidth. The frequency bandwidth an object doesn't occupy will either be reduced or removed. In order to apply two separate bandwidth EQs to a sound (with a cut between), a duplicate object would need to be created, and both objects separately positioned in the respective spaces to apply the desired EQs.

Figure 4 shows the outer side partitions of the 3D environment used to isolate object(s) within a partitioned area to a specific sound output (see Figure 20 below). Further partitioning could be made to include the separate EQ bands if desired (or in another/EQ mode). The size and amount of partitions may be changed. The output assignment may be changed. The central un-isolated space uses all outputs, similar to surround sound stereo, which will present the output sound relative to the placement of the object within the un-isolated environment. Objects can be automated to follow defined paths to generate synchronous ambisonic panning effects.

Figure 5 shows grouped together soundstages, with objects set within each environment and the current active soundstage in the centre. This grouping forms a navigable map (as shown in Figure 6). Figure 6 shows map navigation and the current active soundstage change dependent on location data.

Figure 7 shows that the active soundstage can be anywhere on the map, irrespective of territorial boundaries. All objects outside of the active soundstage area are stopped. All objects within the active soundstage area are played.

Figure 8 shows two waveforms, A and B. In Figure 8a the first waveform relationship shows both A and B in perfect phase, Figure 8b shows B forward half a cycle, Figure 8c shows B behind half a cycle, and Figure 8d shows both A and B completely out of phase.

Figure 9a shows a sine wave waveform cycle, Figure 9b shows a square wave waveform cycle, Figure 9c shows a triangle wave waveform cycle and Figure 9d shows a sawtooth wave waveform cycle.

Figure 10a shows a 3D spherical object, Figure 10b shows a 3D cubic object, Figure 10c shows a 3D conal object, and Figure 10d shows a 3D pyramidal object. A) If an object is spherical/ellipsoidal, the waveform will be sinusoidal; B) If an object is cubic, the waveform will be a square wave; C) If an object is equilateral or isosceles prismatic and/or has rounded sides, then the waveform will be a triangle wave; D) If the shape is right-angle prismatic and/or has flat sides, then the waveform will be sawtooth.

Figure 11 shows a group of objects that originally started out as cubes and have been changed in various ways. A cube is used as a base shape to serve as an example only. Figure 11a shows a cube that has almost been transformed into a sphere. Figure 11b shows a cube that has been rounded at the edges. Figure 11c shows a cube that has been indented on each of the sides from the centre. Figure 11d shows a cube that has been fully rounded on all vertical sides to form a cylinder.

Figure 11e shows a cube that has been tilted to one side to form a parallelogram shape. Figure 1 lf shows a cube that has had one side tilted to form a trapezium shape prism. Figure llg shows a cube that has been formed into a hexagonal prism. Figure 11h shows a cube that has been formed into a multi-sided diamond-like shape. Figure 11i shows a cube that has been rounded at the edges and corrugated on all sides. Figure 11j shows a cube that has been formed into a curved pyramid. Figure 11k shows a cube that has been formed into an indented cone.

Figure 12 shows the waveform shapes created by such shape manipulation, as described in Figure 11 -as in Figure 11 a cube is used as an example to explain shape manipulation, hereby a square wave is used as the base waveform shape to demonstrate the relationship between the two. Therefore, the following figures are all modified square waves, for example purposes only, even though some may more closely resemble that of sine, triangle, or sawtooth waves. Figure 12a shows an equally combined sine waveform and square waveform, which may produce a large amount of high frequency loss. Figure 12b shows a square waveform with rounded corners, which may produce a small amount of high frequency loss. Figure 12c shows an indented square wave, which may produce an amount of low frequency loss. Figure 13d shows a square wave with a curved top, which may produce a low frequency boost (accentuated fundamental). Figure 12e shows a square wave that is ramped at the start, which may produce total high frequency loss. Figure 12f shows a square wave that is sloped to end, which may produce a low frequency phase shift. Figure 12g shows a square wave that is ramped at start and end, which may produce both a high and a low frequency loss. Figure 12h shows a square wave that has been ramped at the start and sloped to end, which may produce a high frequency loss and low frequency phase shift. Figure 12i shows a square wave with ripples at the peak, which may produce a damped oscillation. Figure 12j shows a square wave that has a ramped curve at the start to peak and a sloped end, which may produce high frequency loss and low frequency phase shift. Figure 12k shows a square wave that has a ramped curve from the start to the end, which may produce low frequency loss and low frequency phase shift.

Figure 13 shows two different complex objects. Such complex objects may be included in a scene, included in a soundstage, or imported to the system via any of the aforementioned hardware input devices, including virtual reality headset spatial awareness to include objects present in the user's real environment, computer vision analysis of 2D or 3D images. Figure 13a shows an animated motor vehicle. A system analysis of this object for sound production would include; object recognition = one car and no background, monochromatic colour -majorly range is grey, black, white; smooth reflective surface; rounded, curved shapes. The output sound production for this image may be relative to one (layer) atonal sine wave with added white noise and reverb. Other visual effect analysis may be included (that cannot be represented in the diagram), such as, brightness, which would also apply relative effect to sound output. Figure 13b shows an animated (pale green) house. A system analysis of this object for sound production would include; object recognition = house, door, windows; colour range = red, green; multiple surfaces, majority non-reflective and rough; straight square and triangular shapes. The output sound production for this image may be relative to a complex sound made up of two (layers) square and triangle wave frequencies at different musical tones - ##C1, A#3##. Other visual effect analysis may be included (that cannot be represented in the diagram), such as, brightness, which would also apply relative effect to sound output.

Figure 14 shows an object and how an object's [3D] dimensions relate to the Cartesian coordinate system. X-axis relates to object Width; Y-axis relates to object Height; Z-axis relates to object Depth. The centre point of the Cartesian coordinate marker remains the centre of the object -plus (+) and minus (-) lengths of each axis can be changed independently from this centre point.

Figure 15 shows different combinations of cross-section slices of different object surface layer textures. These diagrams describe the relation of object surface layer texturing combinations in a defined proximity of one another and the possible resulting compositional texturing. a: Monophony: If there is only one object and the surface is set to default (no texture) then the sound/music will be monophonic i.e. remain the way it is originally notated by the composer. b: Heterophony 1: If there is only one object layer and the surface is set to have texture then the sound/music will be heterophonic. c: Heterophony 2: If there is more than one object and the surface of one object is set to have no texture, but the other objects are set to have different textures, then the sound/music may become heterophonic. d: Homophony: If there is more than one object and all surface layers are set to have no texture, then the sound/music may become homophonic. e: Homorhythm: If there is more than one object and all surface layers are set to have texture, but all the textures are the same, then the sound/music may become homorhythmic. f: Polyphony: If there is more than one object and all surface layers are set to have texture, but all the textures are different, then the sound/music may become polyphonic.

Figure 16a shows a monophonic texture pattern traditionally notated for sheet music. Monophony consists of melody without accompanying harmony. Hence, this is created by one untextured master layer. Figure 16b shows a heterophonic texture pattern traditionally notated for sheet music. Heterophony is characterized by the simultaneous variation of a single melodic line. Such a texture can be regarded as a kind of complex monophony in which there is only one basic melody, but realized at the same time in multiple voices, each of which plays the melody differently, either in a different rhythm or tempo, or with various embellishments and elaborations.

Hence, this is created by, either, (1) one textured master layer, or, (2) an untextured master layer being accompanied by layers of varying textures. Figure 16c shows a homophonic texture pattern traditionally notated for sheet music. In homophony two or more parts move together in harmony, the relationship between them creating chords. Hence, this is created by multiple untextured (monophonic) layers.

Figure 16d shows a homorhythmic texture pattern traditionally notated for sheet music. Homorhythm is similar to homophony, however, is only rhythmically similar, not harmonically. Hence, this is created by multiple textured layers, but all textures being similar. Figure 16e shows a polyphonic texture pattern traditionally notated for sheet music. Polyphony consists of two or more simultaneous lines, all of independent melody. Hence, this is created by multiple textured layers, with all textures being different.

Figure 17 shows a cross-section slice of an object, which displays each of the layers within the object and their relative distances between each layer. The centre distance is the size of the core. The distance between each layer may be changed.

The distance between each layer determines the audible level of each layer in relation to each layer present in the object. The surface layer is always the master level. If any two layers are close together, their levels will be equal, but the further apart from the surface layer, the lower the level will be in relation. The order of the layers may or may not be changed accordingly for mixing purposes.

Figure 18 shows a standard mobile media device, including built in camera, microphone, and light sensor. Many mobile devices also feature GPS location services, Bluetooth, Wi-Fi, and accelerometers. Such I/0s and I/O data maybe used as control data in the generation of sound in such a method and processes, as described within this system. For example, a live feed of facial tracking to recognise a busy environment, location data to determine urban or rural landscapes, light sensitivity to control various effects, etc. Other sensors may be used as inputs to control the data element(s). This may include, but is not limited to, computer vision, kinetic/motion, gesture, and biofeedback -namely such sensors may include, but are not limited to, Leap Motion, which uses infrared LEDs for precise 3D hand and finger recognition and tracking; Microsoft Kinect/Softkinetic DepthSense, which uses Time-of-Flight CMOS sensors for 3D full-body recognition and tracking; Myo, which uses mechanomyogram (MMG) sensors that observe muscle activity in the forearm(s) for hand and finger movement and gesture recognition and tracking.

Figure 19 shows a diagram of the magnetic interaction relationship between objects. Objects may be given magnetic properties of varying strengths, either with a plus (+) or a minus (-) polarization. When objects are in proximity of such fields, this determines whether they are attracted to or repelled away from one another. The size of such proximal area is determined by the strength of the magnetic field applied to any object.

Figure 20 shows a diagram of an example speaker array completely surrounding a dance floor from all angles/directions. The polygons represent speakers that may each be full speakers systems (providing all frequency ranges, including high, mid, low, sub), or may be separated by EQ ranges i.e. high ranges, mid ranges, low ranges, sub ranges.

These and other features of the invention will now be illustrated by the following examples.

Examplel: Ambisonic Soundstage In this example, the present invention provides a system for sound mixing and sequencing in 3-dimensions (3D), in real-time, whereby sound sources are visually displayed via a graphics engine in 3D, within a 3D environment.

The processes described herein refers to 3-dimensions, but may or may not be represented in 2-dimensions, digitally displayed and modified digitally via a sound and graphics engine, or calculated linearly through transcription. The 3-dimensional environment herein is described using a Y-up, left-handed (Cartesian) coordinate system, but could be applied to any similar method.

Sensor data may or may not be used to either control data, or as input data. This may or may not be camera data, microphone data, location data, biometric data, or uploaded multimedia. Non-musical data from such sensors and/or media, such as, motion, gesture, position, orientation, location, ambient noise, face tracking, heart rate, colour analysis, contrast, etc. may or may not be used as input data for the invention to generate further manipulation.

The entire environment is a virtual sound stage. Each object's position within an active environment can be moved around, which is output in real-time via a sound system. Manipulation of object position can control sound output panning and intensity. Positioning of objects in the environment can control sound frequency equalisation. The outer sides of the 3D environment can be partitioned to isolate object(s) within a partitioned area to a specific sound output or EQ. An example output arrangement may represent a dome of tiered speaker arrays/rings surrounding a dance floor or concert hall. This allows the user to mix objects within isolated sources and points of the soundstage.

Each virtual sound stage represents real-world environmental conditions, which can be controlled. These conditions affect variables of the sound, as they would in reality. Such variables include, but are not limited to atmospheric pressure, temperature, and humidity.

The size of each soundstage environment can be controlled. The size of the space controls a reverb effect that is applied to each object within the space. A larger space will apply a wide reverb, whereas a smaller space will apply a narrow reverb. The overall perceived effect on the sound will be dependent on each object position.

Each space can also be set as open or closed, which sets the reverb to open or closed reverb. The type of reverb applied is representative of natural reverb that would occur in a similar real environment.

Completed soundstages can be grouped together to form a map, which can be navigated like a virtual world. As the user navigates the map, the current active soundstage changes, dependent on the user's current location. Through this navigation, the user can sequence entire presentations of music.

Example 2: Visual and Physical Modelling Synthesis The invention may be used as a process, method, and/or system for visual-physical modelling synthesis of complex sounds using frequencies of visual light and properties of physical matter, and natural or unnatural user and environmental conditions in 2-dimensions and/or 3-dimensions, whereby the visual, matter, and shape data embodies the sound to be produced (similar to traditional instruments).

Visual and physical data, such as hue, brightness, weight, and shape, are analysed and used by the system's algorithm to process and synthesise sound, using noninstrumental/non-musical values to create objects for physical modelling synthesis.

The data processing described herein refers to 3-dimensions, but may or may not be represented in 2-dimensions, digitally displayed and modified digitally via a sound and graphics engine, or calculated linearly through transcription. Sensor or media data may or may not be used to either control data or as input data. This may include, but not limited to, motion and gesture data, camera data, microphone data, location data, biometric data, electroencephalography (EEG) data, mechanomyogram (MMG) data, uploaded multimedia, and/or any combination of the aforementioned. Non-musical data received by such sensors and/or media, such as, motion, gesture, position, orientation, location, ambient noise, face tracking, heart rate, colour analysis, contrast, etc. may or may not be used as input data for the invention to generate audio-visual synthesis output and/or for further manipulation. This could, for example, range from controlling the system via a motion sensor by hand within a 3D virtual reality (VA) world, or analysing a photographic image or live camera feed to generate sound.

Direct visual and physical sound modelling relationships are discussed below:

VISUAL LIGHT SOUND

Colour/hue Note/tone/semitone +octave Volume Amplitude Layers Additive synthesis Brightness/luma White noise Contrast Phase Tint High pass filter Shade Low pass filter Saturation Accuracy/syncopation Invert Frequency inversion Definition/pixel rate Fidelity/bit rate

PHYSICS SOUND

Weight Velocity, tremolo/amplitude modulation Density Oscillation/intensity State Vibrato/frequency modulation Shape Waveform shape Texture Harmonic-rhythmic-melodic texture Width (R/L) Stereo width Height (U/D) Stereo height, decay (U), release (D) Depth/length (F/B) Length of sustain Reflection Reverb i) Light-to-Sound Synthesis Sound is synthesised from variables of light frequencies and visual effects. These are broken down into individual parameters of visual light frequency, which translate and control variables that synthesise audible sound frequencies. More than one frequency can be used; each individual frequency is hereby called a layer(s). Each layer adds a signal/tone that can be individually edited and controlled as described throughout this document. This is similar to additive synthesis. These layers are interconnected and serve as part of a whole object.

Colour/Hue sets the actual frequency of the base tone. This may or may not be more or less the range shown below, or may or may not be more or less a full 256 RGB or CMYK colour range. Ranges of colours larger, or smaller, than what is shown below will be spread, or condensed, between the colour bands shown below, however, such bands remain within their colour of the spectrum i.e. in a colour range of 256 colours, all sound frequency between the musical note range F3 to A3 would still remain within the yellow spectrum, but in this instance Yellow would be divided into different tones of yellow.

Colour Colour Frequency (THz) Musical Note Range Octave Black None Infrared 3-400 CO-BO 0 Red 400-484 C1-A#2/Bb2 1-2 Orange 484-508 A#2/Bb2-E3 2-3 Yellow 508-526 F3-A3 3 Green 526-606 A#3/Bb3-G5 3-5 Blue 606-668 GS-B6 5-6 Indigo 668-715 B6-C8 6-8 Violet 715-789 C8-B8 8 Ultraviolet 789-3000 C9-D#10/Eb10 9 White All All All Non-colours, black and white, and non-visual colours, infrared and ultraviolet, are included to represent disparate sounds. White contains all possible frequencies, which represents white noise. Black has no frequency and represents atonal sounds, such as, but not limited to drum hits. Infrared contains very low-end frequencies down to frequencies below human hearing. Ultraviolet contains extreme high-end frequencies, up to those above human hearing.

The volume of a colour (of a predefined range 0-maximum) determines the amplitude of a layer (between a predefined range 0-maximum). The visual Brightness/Luma of a colour (of a predefined range 0-maximum) determines the amount of additional white noise (between a predefined range 0-maximum) added to a layer, or object. The visual tint effect of a colour (of a predefined range 0-maximum) determines an amount of High Pass Filter (HPF) applied (between a predefined range 0-maximum) to a layer, or object. The maximum amount of effect applied will fully kill the audible sound so that nothing can be heard. The visual Shade effect of a colour (of a predefined range 0-maximum) determines the amount of Low Pass Filter (LPF) applied (between a predefined range 0-maximum) to a layer, or object. The maximum amount of effect applied will fully kill the audible sound so that nothing can be heard. The visual saturation effect of a colour (of a predefined range 0-maximum) determines the amount of Band Pass Filter (BPF) applied (between a predefined range 0-maximum) to a layer, or object. The maximum amount of effect applied will fully kill the audible sound so that nothing can be heard. The visual contrast effect parameter (of a predefined range +100% and -100% of 0) controls the phase relationship between two or more layers of an object, by moving their cycle forwards, or backwards. If more than one layer is used, this can be used to move each layer in and out of phase. The visual sharpness/acutance effect parameter (of a predefined range +100% and -100% of 0) determines the accuracy of a layer in relation to the musical time (tempo). When set at 0, the layer will be in sync with the tempo; when set at +100%, the layer will be a full musical beat ahead of the tempo; when set at -100%, the layer will be a full musical beat behind the tempo -the difference between the percentage range respectively represents such time difference offsets i.e. +50% would set the layer ahead of the tempo by half a beat, whereas, -25% would set the layer behind a quarter of a beat. This could be viewed as syncopation (delay and anticipation).

The visual invert effect inverts all layers of an object. This flips each layer to a polar opposite frequency between 20HZ and 20,000HZ -the centre point for such mirroring is 9990HZ. This is similar to frequency inversion.

Visual definition (pixel rate) controls the fidelity (bit rate) of a layer, or object.

Higher pixel rate and better aspect ratios provide higher audio bit rates, and vice versa. Lowering the definition to blur or distort the visual applies degrees of distortion to a layer, or object.

The distance between layers may or may not be controlled, which controls the audio level (not to be confused with volume/amplitude) relationship between each of the layers present in any one object.

ii) 3D Waveshape Modelling The shape of an object directly controls the audio waveform shape. The basic/fundamental shapes and waveform shapes are: Sine/sinusoidal wave Curved/rounded (spherical) Square wave Square (cuboidal) Triangle wave Triangular, even (prismatic) Sawtooth wave Triangular, edged (prismatic) These shapes are the basis to how shape data of objects translate to the output sound/waveform. This is not limited to basic shapes, and may or may not include 3D shapes. Other considerations may include polyhedrons, superquadrics, superellipsoids, supertoroids, and fractaloids. Any complex shape is generally a combination of basic shapes. Similarly, more complex waveform shapes are formed from combinations of the basic waveform shapes. If such shapes are used, then respective complex waveform shapes will be generated.

iii) Physical Timbre Modelling The system provides a method where physical properties of matter determine the timbre aspects of sound. Physical properties can be assigned to objects that translate as properties of sound.

The physical weight data of an object (between a predefined range 0-maximum) determines the speed of attack of the amplitude envelope (velocity) and amount of amplitude modulation (AM)/tremolo applied (between 0-maximum).

The physical state data of an object (between a predefined range 0-maximum) determines the amount of frequency modulation (FM)/vibrato applied (between 0-maximum). If an object(s) is used in a live environment, such as that described in Ambisonic Soundstage above, where a temperature control is applied, this can change the state of an object i.e. heating above a set melting/boiling point may melt a solid object into a liquid state, or boil a liquid object into a gas state, or cooling below a set freezing point may freeze a liquid into a solid state.

The physical density data of an object determines (between a predefined range 0-maximum) the amount of oscillation/intensity applied to AM and FM (between 0 - maximum). The physical width data of an object (between a predefined range 0-maximum) determines the stereo width of the sound (between 0-maximum). This is measured both left and right of the centre point of an object. Distance measurements left and right may or may not be adjusted independently.

The physical height data of an object (between a predefined range 0-maximum) determines the stereo height of the sound, and the length of decay and release of the amplitude envelope (between 0-maximum). This is measured both up and down of the centre point of an object. The upward measurement determines the length of decay, while the downward measurement determines the length of release. Distance measurements both up and down may or may not be adjusted independently, although only by the overall height measurement of an object affects stereo height.

The physical depth (or length) data of an object (between a predefined range 0-maximum) determines the length of sustain of the amplitude envelope (between 0-maximum). This is measured both back and fourth of the centre point of an object.

Distance measurements both backward and forward may or may not be adjusted independently, although only the overall distance measurement affects the length of sustain.

iv) Physical Texture Composition The physical/visual surface texture of layer(s)/object(s) determines the added (musical) harmonic, rhythmic, and melodic texture of objects in close proximity to one another. Such textures may include, but not limited to, flat/smooth (no texture), or grainy, course, or rough. The number and/or combination of object/layer surface textures within specific vicinity determine such added musical texture. This proximity can be user defined and is also directly affected by magnetism and other naturally occurring relationships between objects and conditions of the environment.

The amount/level of visual texturing applied to layer(s)/object(s) may or may not be increased or decreased, and such levelling may or may not have an affect on the level of complexity and texturing added to the sound/music. This may or may not also include contrapuntal texturing, such as, but not limited to, canon and fugue.

v) Image-to-Sound Effects Other visual effects may be used to affect objects, which may or may not be a combination of the aforementioned variables. These may or may not include, and not be limited to, reflection, strobe/pulsate, and hollowness. These may or may not apply such sound effects as, but not limited to, flanger, phaser, chorus, reverb, transform, panning, and echo. Hanger, phaser, and chorus are all applied through layering and controlling layers in different ways. If a delayed layer is added to an original (master) layer with a continuously variable delay (usually smaller than 10ms), then one layer is slowed, so that it becomes out-of-phase with its partner, and produces a phasing effect, this creates a Flanger effect. Moving layers in and out of phase with the master makes the phasing effect appear to slide up the frequency spectrum. This phasing up-and-down the register can be performed rhythmically. If a layer(s) is split, and a portion is filtered with an all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed, this produces a Phaser effect. If a delayed layer is added to a master layer with a constant delay, and the delay is short, in order not to be perceived as echo, but above 5ms to be audible -If the delay is too short, it will destructively interfere with the un-delayed layer, and create a flanging effect -the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices, this produces a Chorus effect.

The reflection of a surface of a layer determines the amount, and type, of reverberation applied to a layer, or an object. Such surface type effects may or may not include, but not limited to, diffusion, emission, specular, and opacity. Such reverb sound effect parameters may or may not include, but not limited to, size, outdoor/indoor, and delay. A strobe/pulsate light effect applied to an audio transform will affect the sound at different intervals depending on the rate of strobe.

This could be applied to a beat ratio of 1:1, 1:2, 1:4, 1:18, or 1:16. The Hollowness of the core an object determines an amount of audio DSP effect, echo, which is applied -more echo is applied to objects that are hollower.

vi) Physics A magnetism value can be applied to objects. This determines how objects may interact in proximity to other objects, if used in a live environment (such as described in Ambisonic Soundstage). Polarisation would mean objects would either attract or repel. The strength of Magnetism applied would determine the force and distance such an effect would have on neighbouring objects.

A gravity value may be applied to an environment(s), or specified section(s) of an environment. The gravity value would affect how any objects within the environment(s), or section(s) of the environment, as they would in nature, dependant on the properties of the object. For example, this may be, but not limited to, gravity becoming lighter and object(s) floating, or gravity becoming heavier and object(s) dropping -objects may or may not float out of the audible area of the environment, or, may rest or bounce off the floor depending on the gravity value applied and the material value given to the object and the floor of the environment.

Other environmental values may affect such behaviours, for example, but not limited to, the state and material of the environment itself.

A temperature value may be applied to an environment(s), or specified section(s) of an environment. The temperature value would affect how any objects within the environment(s), or section(s) of the environment, as they would in nature, dependent on the properties of the object(s). For example, this may be, but not limited to, temperature decreasing and object(s) condensing or solidifying, or temperature increasing and object(s) melting or evaporating. Other environmental values may affect such behaviours, for example, but not limited to, the state and material of the environment itself.

Other miscellaneous properties/conditions may include but not limited to; pressure, humidity, atmospheric electricity, viscosity, half-life, buoyancy, and absorbency. Whereby the object(s) and output sound(s) will be naturally affected relative to such first data elements adopting such environment condition data changes. These conditions may or may not also include natural and unnatural phenomenon, such as, but not limited to, electromagnetic pulse (EMP), lunar changes, planetary alignments, radiation, solar flares, and earthquake bombs.

Different areas of the environment may be partitioned and assigned different environmental conditions.

Software mechanics like artificial intelligence (AI) characteristics and behaviours may be applied to objects. Such characteristics may include proximal defence and attack response mechanisms -AV outputs would be defined by the combination of present audio and visual variables and the interaction between two or more objects being in proximity/contact. This may also include applying predefined paths for objects to follow, as described in Fig. 4. Other such AI may be relative to environmental changes, such as temperature and humidity, which may change or fluctuate over a predefined time or event.

The described environment refers to standard conditions, such as, but not limited to, room temperature and pressure, and Earth's gravitational force and oxygen-rich gaseous environment. However, the environment atmosphere can be fully modified, for example, but not limited to, any of the aforementioned properties/conditions, as well as, state and matter -solid(s), liquid(s), gas(es) -which may include, but not limited to, natural or unnatural/real or imaginary elements/materials (i.e. red mercury). Similarly, objects residing in the environment are affected by such changes as they would be affected in nature.

A computer program product adapted to perform the above method, or a computer-readable storage medium or data carrier comprising the program may be provided.

These and other embodiments will be apparent to those skilled in the art

Claims

CLAIMS1. Data comprehension method, comprising: Inputting at least a first data element; Displaying the first data element in a visual format; Manipulating the first data element in the visual format using pre-defined gestures to create a modified data element; and Outputting the modified data element aurally.
2. Method of claim 1, further comprising the steps of: Assigning a first object to the first data element; Displaying the first data element visually in the form of the first object, such that the pre-defined gestures manipulate the object; and Assigning an aural component to the first object.
3. Method of claim 2, wherein when more than a first data element is input, assigning an object to each data element and assigning an aural component to each object.
4. Method of claims 2 or 3, wherein when the object is manipulated, the aural component varies in accordance with the manipulation.
5. Method of any preceding claim, wherein the first data element represents a physical, optical, tactile, odorous or aural variable.
6. Method of any preceding claim, wherein the modified data element is output aurally via at least one sound emitting device.
7. Method of any of claims 1 to 5, wherein the first data element represents a sound, and the modified data element forms part of a soundstage.
8. Method of any preceding claim, wherein the visual format is in 3-D.
9. Method of claim 8, wherein the gestures are enacted in 3-D.
10. Method of any of claims 1 to 7, wherein the visual format is in 2-D.
11. Method of claim 10, wherein the visual format is displayed on a touch-sensitive screen device.
12. Method of claim 10, wherein the visual format is displayed on a screen, and the first data element is manipulated via a remote touch sensitive device.
13. Method of any of claims 2 to 12, wherein the object is rendered in the form of voxels, and the gesture influences the voxel nodes.
14. A computer program product adapted to perform the method of any of claims 1 to 13.
15. A computer-readable storage medium or data carrier comprising the program of claim 14.