WO2017119915A1

WO2017119915A1 - Method and apparatus for converting audio data into a visual representation

Info

Publication number: WO2017119915A1
Application number: PCT/US2016/014420
Authority: WO
Inventors: Zi Hao QIU
Original assignee: Qiu Zi Hao
Priority date: 2016-01-04
Filing date: 2016-01-22
Publication date: 2017-07-13
Also published as: CN105810209A

Abstract

An approach is provided for converting audio data into a visual representation. The approach involves determining one or more characteristic values from audio data. The one or more characteristic values include one or more musical notes and respective durations for the one or more musical notes. The approach also involves mapping the one or more musical notes to a first visual characteristic that includes a color, a pattern, a design, or a combination thereof of the one or more visual elements. The approach also involves mapping the respective durations to a second visual characteristic of the one or more visual elements. The second visual characteristic includes a size of the one or more visual elements. The approach further involves generating a visual representation of the audio data by rendering the one or more visual elements using the mapped first visual characteristic and the mapped second visual characteristic.

Description

METHOD AND APPARATUS FOR

CONVERTING AUDIO DATA INTO A VISUAL REPRESENTATION

RELATED APPLICATION

[0001] This application claims priority benefit to Chinese Patent Application Serial No. 201610003490.X, entitled "A Data Conversion Method Based on (Object) Relational Mapping", filed January 4, 2016, which is herein incorporated by reference in its entirety.

TECHNICAL FIELD

[0002] This present application relates to data processing, in particular to technology for converting between audio and visual data.

BACKGROUND

[0003] To some extent, the popularity of computers has helped the creation of music and art. However, current audio and imagine conversion technologies have many potential shortcomings. For example, for the average person, it is often difficult to translate a piece of music directly into a visual representation (e.g., a painted work) because the person may lack the sufficient artistic or musical skill. To create such a conversion would normally require a subjective viewpoint since creators generally need to understand the painting as an accomplishment in itself. For example, understanding music and music theory as well as other cross-disciplinary artistic aspects can be hard to master, and so the conversion between music and a visual representation (e.g., a painting with artistic value) also can be difficult to achieve. Accordingly, there are significant technical challenges associated with converting between audio and visual data.

SOME EXAMPLE EMBODIMENTS

[0004] Therefore, there is a need for an approach for converting audio data into a visual representation.

[0005] According to one embodiment, a method comprises determining one or more characteristic values from audio data. The one or more characteristic values include one or more musical notes and respective durations for the one or more musical notes. The method also comprises mapping the one or more musical notes to a first visual characteristic of one or more visual elements. The first visual characteristic includes a color, a pattern, a design, or a combination thereof of the one or more visual elements. The method also comprises mapping the respective durations to a second visual characteristic of the one or more visual elements. The second visual characteristic includes a size of the one or more visual elements. The method further comprises generating a visual representation of the audio data by rendering the one or more visual elements using the mapped first visual characteristic and the mapped second visual characteristic.

[0006] According to another embodiment, a method comprises processing a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation. The first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements. The second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements. The method also comprises generating an audio sequence based on the one or more musical notes and the respective durations. The method further comprises initiating a playback of the audio sequence.

[0007] According to another embodiment, an apparatus comprises at least one processor, and at least one memory including computer program code for one or more computer programs, the at least one memory and the computer program code configured to, with the at least one processor, cause, at least in part, the apparatus to determine one or more characteristic values from audio data. The one or more characteristic values include one or more musical notes and respective durations for the one or more musical notes. The apparatus is also caused to map the one or more musical notes to a first visual characteristic of one or more visual elements. The first visual characteristic includes a color, a pattern, a design, or a combination thereof of the one or more visual elements. The apparatus is also caused to map the respective durations to a second visual characteristic of the one or more visual elements. The second visual characteristic includes a size of the one or more visual elements. The apparatus is further caused to generate a visual representation of the audio data by rendering the one or more visual elements using the mapped first visual characteristic and the mapped second visual characteristic. [0008] According to another embodiment, an apparatus comprises at least one processor, and at least one memory including computer program code for one or more computer programs, the at least one memory and the computer program code configured to, with the at least one processor, cause, at least in part, the apparatus to process a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation. The first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements. The second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements. The apparatus is also caused to generate an audio sequence based on the one or more musical notes and the respective durations. The apparatus is further caused to initiate a playback of the audio sequence.

[0009] According to another embodiment, a computer-readable storage medium carries one or more sequences of one or more instructions which, when executed by one or more processors, cause, at least in part, an apparatus to process a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation. The first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements. The second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements. The apparatus is also caused to generate an audio sequence based on the one or more musical notes and the respective durations. The apparatus is further caused to initiate a playback of the audio sequence.

[0010] According to another embodiment, an apparatus comprises means for determining one or more characteristic values from audio data. The one or more characteristic values include one or more musical notes and respective durations for the one or more musical notes. The apparatus also comprises means for mapping the one or more musical notes to a first visual characteristic of one or more visual elements. The first visual characteristic includes a color, a pattern, a design, or a combination thereof of the one or more visual elements. The apparatus also comprises means for mapping the respective durations to a second visual characteristic of the one or more visual elements. The second visual characteristic includes a size of the one or more visual elements. The apparatus also comprises means for generating a visual representation of the audio data by rendering the one or more visual elements using the mapped first visual characteristic and the mapped second visual characteristic.

[0011] According to another embodiment, an apparatus comprises means for processing a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation. The first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements. The second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements. The apparatus also comprises means for generating an audio sequence based on the one or more musical notes and the respective durations. The apparatus further comprises means for initiating a playback of the audio sequence.

[0012] In addition, for various example embodiments of the invention, the following is applicable: a method comprising facilitating a processing of and/or processing (1) data and/or (2) information and/or (3) at least one signal, the (1) data and/or (2) information and/or (3) at least one signal based, at least in part, on (or derived at least in part from) any one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention.

[0013] For various example embodiments of the invention, the following is also applicable: a method comprising facilitating access to at least one interface configured to allow access to at least one service, the at least one service configured to perform any one or any combination of network or service provider methods (or processes) disclosed in this application.

[0014] For various example embodiments of the invention, the following is also applicable: a method comprising facilitating creating and/or facilitating modifying (1) at least one device user interface element and/or (2) at least one device user interface functionality, the (1) at least one device user interface element and/or (2) at least one device user interface functionality based, at least in part, on data and/or information resulting from one or any combination of methods or processes disclosed in this application as relevant to any embodiment of the invention, and/or at least one signal resulting from one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention. [0015] For various example embodiments of the invention, the following is also applicable: a method comprising creating and/or modifying (1) at least one device user interface element and/or (2) at least one device user interface functionality, the (1) at least one device user interface element and/or (2) at least one device user interface functionality based at least in part on data and/or information resulting from one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention, and/or at least one signal resulting from one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention.

[0016] In various example embodiments, the methods (or processes) can be accomplished on the service provider side or on the mobile device side or in any shared way between service provider and mobile device with actions being performed on both sides.

[0017] For various example embodiments, the following is applicable: An apparatus comprising means for performing the method of any of the claims.

[0018] Still other aspects, features, and advantages of the invention are readily apparent from the following detailed description, simply by illustrating a number of particular embodiments and implementations, including the best mode contemplated for carrying out the invention. The invention is also capable of other and different embodiments, and its several details can be modified in various obvious respects, all without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] The embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings:

[0020] FIG. 1 is a diagram of a system capable converting audio data into a visual representation, according to one embodiment;

[0021] FIG. 2 is a diagram of a piano keyboard used in the various processes described herein, according to one embodiment; [0022] FIGs. 3A and 3B are diagrams of example audio data provided a musical score in standard notation for conversion, according to various embodiments;

[0023] FIGs. 4A-4C are diagrams of example visual representations of audio data, according to various embodiments;

[0024] FIG. 5 is flowchart of a process for converting audio data into a visual representation, according to one embodiment;

[0025] FIG. 6 is a flowchart of a process for determining and audio sequence and selecting a color space for a visual representation of audio data, according to one embodiment;

[0026] FIG. 7 is a flowchart of a process for representing duration of musical notes in audio data using a size of a visual element, according to one embodiment;

[0027] FIG. 8 is a flowchart of a process for generating a legend of presentation in a visual representation of audio data, according to one embodiment;

[0028] FIG. 9 is a flowchart of a process for converting a visual representation of audio data into an audio sequence for playback, according to one embodiment;

[0029] FIG. 10 is a diagram of hardware that can be used to implement an embodiment of the invention;

[0030] FIG. 11 is a diagram of a chip set that can be used to implement an embodiment of the invention; and

[0031] FIG. 12 is a diagram of a mobile terminal (e.g., handset) that can be used to implement an embodiment of the invention.

DESCRIPTION OF SOME EMBODIMENTS

[0032] Examples of a method, apparatus, and computer program for converting audio data into a visual representation (and vice versa) are disclosed. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It is apparent, however, to one skilled in the art that the embodiments of the invention may be practiced without these specific details or with an equivalent arrangement. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the embodiments of the invention.

[0033] The various embodiments of this method, apparatus, and computer program relate to an audio conversion technique between audio and visual data. On the one hand this method could be applied to turning music and songs into visual representations (e.g., paintings), but on the other hand could be equally applied to visual representations (e.g., displaying color bars and scales) to transform them into audio data (e.g., music). By way of example, these embodiments can be applied to the industries of communications, video, composing, teaching, games and other computer-related fields.

[0034] FIG. 1 is a diagram of a system capable converting audio data into a visual representation, according to one embodiment. As noted above, the process for converting audio data to visual data (e.g., a painting) traditionally has relied on expertise in cross-disciplinary artistic concepts (e.g., music theory, composition, etc.) combined with artistic and musical skill to achieve subjectively pleasing or "good" results. However, this knowledge and skill often is out of reach for average users, thereby, limiting the ability of these users to convert between audio data and visual data while achieving artistic results.

[0035] In light of this problem, in one embodiment, a system 100 of FIG. 1 introduces a capability to convert audio data into visual representations or images by, for instance, obtaining one or more characteristic values from the audio data; determining one or more characteristics for converting from audio to visual; obtaining a color set (or pattern or design set) based on the relationships mapped for these characteristics, the set including a range of colors, patterns, and/or designs; and then generating a visual representation (e.g., an image) using an algorithmic process as discussed with respect to the various embodiments described herein.

[0036] For example, in one embodiment, the generating of the visual representation can be based on color spaces to achieve artistic color composition. By way of example, color spaces are a specific organization of colors that can have values matched with each color in sequence, which can then be further matched to a tone or musical note in a tonal sequence extracted from audio data. The matching of the color sequence to the tonal or audio sequence can create, for instance, a continuous matching sequence of numbers connecting colors and tones. In this way, the generated visual representation of audio data provides a visual and intuitive from of image creation (e.g., an intuitive form of painting) that can express a musical sense of hearing by matching data between images and sounds, where the visual characteristics of the image expresses characteristics of the audio data (e.g., musical notes and durations of those notes, rhythm, etc.)

[0037] In other words, in one embodiment, the system 100 converts audio data into a visual representation by the following means: (1) obtaining one or more features from the audio data's characteristics (e.g., musical note and durations of those notes); (2) corresponding or mapping these characteristics to values in a color space; (3) corresponding or mapping a spectrum of colors with the values from the color space; and (4) generating a visual representation or image based on these color values.

[0038] In one embodiment, the determination of the features to be used for mapping is based on an analysis of one or more audio characteristics (e.g., musical note or tone, and a duration of the note or one) to generate the audio sequence. In one embodiment, the audio and color values (or pattern or design values) are matched to a logical sequence between them.

[0039] In one embodiment, the sequence of color values is determined from one of the any available or published color spaces including, but not limited to: Pantone, RAL color space, DIC color space, ISO color space, Chinese building color space, NCS color space, Munsell color space, digital X-Rite color space. In one embodiment, the mapping enables the system 100 to take certain color values from the set and can transform them into the corresponding sequence. In one embodiment, the transformation can also include shifting a color sequence up or down the range of colors specified in the color space.

[0040] In yet another embodiment, the system 100 can further use elements or characteristics isolated or extracted from the audio data to match the color to its appropriate partner and adjust the image. In one embodiment, the image is constructed using one or more visual elements (e.g., geographic shapes) that represents in extracted tone. In one embodiment, each depicted visual element in the final image or visual representation represents an individual musical note or tone. The visual characteristics of the visual element is then determined based on the audio character tic values of the extracted musical note. For example, the color characteristic of the visual element can be selected to represent the determined tone or musical note (e.g., the frequency), and a size of the element can be used to represent the duration of the note.

[0041] In one embodiment, the system 100 provides a method for converting audio data into a visual representation using a computer system, comprising at least one central processing unit (CPU), and computer-readable storage medium, and a computer capable of executing the following instructions using data from this medium: (1) determining one or more characteristic values from the audio data; (2) setting the selected characteristics against a set of values; (3) selecting a color space with color values that correspond or is mapped to the audio characteristic values; and (4) generating a visual representation or image based on these color values.

[0042] In one embodiment, the system 100 provides for converting a visual representation or image back into audio data by the following method: (1) isolating a one or more sets of characteristic features from the image; (2) corresponding these characteristics with values in a color space; (3) corresponding a spectrum of sounds with the values from the color space; (4) corresponding respective sizes of elements isolated from the image to durations for the sounds or notes; (5) generating audio data based on these sound values; and (6) optionally initiating playback of these sounds.

[0043] In summary, in one embodiment, the system 100 utilizes a set of colors (such as those included in a selected color space) to create a value that will correspond to each color. This creates a logical relationship between the tones in a musical scale and the various shades of color available, as well as a logical relationship between the sizes of the visual elements representing the tones or notes and the duration of the tone in an audio sequence. In this way, the system 100 can create logical sequences for visual and audio data that can correspond to one another. In addition, the system 100 can provide a process to convert images back to audio data so that the visual and intuitive art of painting can be replicated to auditory senses.

[0044] In this way, a knowledge of music of artistic skills are not essential for appreciating the links between music and art, and we can enhance the interaction between visual and auditory art using the value sets such as those establishing a mapping between tonal values and color/pattern/design values, as well as a mapping between tonal durations and size values for visual elements in an image. [0045] In one embodiment, visual representation such as paintings can be derived from music, allowing viewers to recognize the well-known tunes by reading the colors sequence, so as to visualize the music. When initially setting default values without any initial music input, a color continuum can be chosen and values assigned to each shade, to be assigned to the musical notes at a later stage. Note that the different color values allocated in the code can then be applied to a wide variety of music to create the ideal expression. In this way, music adds another sensory dimension, adding richness and interest. Art and music can feed into one another, boosting their inspiration and creation to new heights.

[0046] In one embodiment, converting audio data to visual data can be expressed in various forms of color, pattern and art. For example, the musical notes included in the audio data can have a color value assigned to them, allowing their expression in art. The painting or visual representation itself may contain different media such as watercolors, gouache, acrylic, oil painting and others. Depending on the sounds, notes, volume and pitch from the music, the system 100 can select certain colors, patterns, or designs to use. In this way, the system 100 can express music in the form of a painting or other visual representation.

[0047] In one embodiment, various phonetic patterns can be configured to use different colors for expression, and so from every piece of music colored blocks can form a pattern and then a picture to be shown on a surface. For each note in the music, both colors and patterns can be created. At the same time it can use different colors, backgrounds and surfaces to render the same song. Therefore the same piece of music can generate a difference range of colors and create different pictures. Examples of features from the audio data may include: the note code as derived from its location, the time value (where the size has a fixed ratio with cell size for the visual version) and the basic rhythm, where the percentage of cells colored will be relative to the time value.

[0048] In one embodiment, certain parameters will be included in calculating this algorithm, including the style of the music's melody and the color space, and the algorithm itself will be a product of the basic elements of the music: the sound level, pitch, notes, time value or duration, etc. In one embodiment, these values will determine the relationship between the code and space allocated during the production of the image. [0049] In one embodiment, a change in detected melodic style will result in a different arrangements of strokes in the data. According to the data retrieved from each musical tone the system 100 can build relationships with individual colors, and therefore variations in melody and pitch, which will determine the choice of colors for the finished visual representation.

[0050] In one embodiment, the individual characteristics of the music notes will have a fixed relationship with the each basic color, including variations in melody, pitch and volume. These will determine the appropriate value in the color space and so thereby a suitable color for expression. The sounds used will range within the boundaries dictated by the magnitude of this variation. In one embodiment, the time value or note duration is identified by an equal proportion of unit cells being allocated to each note. Each note will generate a color bar or other visual element whose area is proportional to its time value.

[0051] In one embodiment, according to the basic media and techniques employed in creating the artwork, the texture of the music will also be altered. For example using oil paints, watercolors, acrylic or other pigments may change the notes, pitch or timing as a function of the basic characteristics of the data. Finally the artist can touch up the picture by adding individual elements like graphics and different artistic forms like strokes and textures. Music can be chosen with specific purpose to create visual art, with a specific range of melodic rhythms and tones creating a range of images using similar colors with a different rhythm or vice versa. In this way, the embodiments of this invention may be incorporated into this artistic process: it is a creative process, allowing a second creation by which the creator and audience convert one creation into another. This allows an interaction between the viewer and the art.

[0052] As shown in FIG. 1, the system 100 comprises one or more user equipment (UE) 101 having connectivity to an audio/visual conversion platform 103, via a communication network 105. In one embodiment, the audio/visual conversion platform 103 performs one or more functions for converting audio data into a visual representation (and vice versa) as discussed with respect to the various embodiments described herein. In addition or alternatively, the UE 101 may execute an audio/visual data conversion application 107 to perform one or more functions for converting audio data into a visual representation. [0053] In one embodiment, the UE 101 further has connectivity to one or more input/output devices 109 for ingesting audio or image data or for generating audio or image data. For example, for ingesting audio or image data, the input/output device 109 may include a microphone for sampling audio, or a camera or scanner for capturing visual audio data including visual representations 111 generated according to the various embodiments described herein. It is contemplated that the input/output device may be configured with any sensor suitable for sampling or capture audio and/or visual data into digital format for processing by the system 100.

[0054] In one embodiment, the type of sensor configured can be based on the type of source data. For example, it is contemplated that audio data can include audio data presented in any form. If audio data is present in the form of musical notation in a song book, for instance, the input/output device 109 can use a scanning device or camera to capture images of the musical notation in the song book for conversion into audio data (e.g., data comprising musical tones or notes and their respective durations). The system 100 can then process the images to extract the audio data through image recognition techniques. In another example, if the audio data is audible data (e.g., live music or music played over speakers), the input/output device 109 can use a microphone to capture audio samples. The system 100 can then process the audio samples using audio recognition or other similar techniques to determine the tones or notes played and their respective durations.

[0055] In one embodiment, for outputting audio or image data, the input/output device 109 can be configured with any number of suitable output modules. For example, to output visual data (e.g., images or other visual representations), the input/output device 109 may be configured with displays (e.g., monitors, projectors, televisions, etc.) to present visual representations 111. For example, a display can be mounted on a wall to present the converted audio data as an image. In addition, the input/output device 109 may include devices for creating physical versions (e.g., paper, canvas, and/or other media such as wood, stone, etc.) of the visual representations 111. These devices include, but are not limited to, printers, three-dimensional printers, computerized numerical control (CNC) machines, printing presses, and the like. Similarly, to output audio data, the input/output device 109 can be configured with an audio playback system. [0056] In one embodiment, the visual representations 111 can embody any electronic or physical form. For example, electronic forms can include images, videos, three-dimensional models, etc. Similarly, physical forms of the visual representations can be in any media or material including, but not limited, to wood, metal, clothes, fabric, collages, etc. of various colors or composition. In one embodiment, these physical forms can be directly generated through appropriate output devices (e.g., printers or other automated means). In another embodiment, the system 100 can provide an output listing of instructions (e.g., color selections, schematics, brush stroke suggestions, etc.) for a user to manually create the visual representation through an artistic medium (e.g., painting, sculpture, etc.). In yet another embodiment, it is contemplated that the visual representation can be imprinted on or otherwise depicted on any article of manufacture including, but not limited, to clothes or other products (e.g., souvenirs, etc.) composed of any material or medium.

[0057] In one embodiment, the input/output device 109 can include a piano keyboard or other similar instrument configured with lighting of different colors (e.g., multi-color LED lighting) that is fixed onto the keyboard. In this way, the keyboard can display appropriate colors corresponding to the notes of the keys when a piece of music is recorded or played back on the keyboard. In this way, a musician playing the keyboard can create a visual image or painting by playing the keys corresponding to the colors that the musician wants to appear in a created image.

[0058] In one embodiment, the input/output device 109 can include a "reading pen" that is configured with a sensor module capable to reading color values. In this way, a user can create a song or other audio data by using the pen to read different colors (e.g., from an existing image, painting, or other visual representation). The colors that are read by the pen are then converted into audio data using the processes discussed with respect to the various embodiments described herein.

[0059] In one embodiment, the UE 101 and/or the audio/visual conversion platform 103 also have connectivity to a service platform 113 that includes one or more services 115a-115m (also collectivity referred to as services 115) for providing other media services and/or other services that support the audio/visual conversion platform 103 (e.g., music, images, cloud storage, printing, content, etc. services). In one embodiment, the service platform 111 and/or services 115 interact with one or more content providers 117a- 117k (also collectively referred to as content providers 117) to provide media or artistic information and/or other related information to the audio/visual conversion platform 103.

[0060] By way of example, the communication network 105 of system 100 includes one or more networks such as a data network, a wireless network, a telephony network, or any combination thereof. It is contemplated that the data network may be any local area network (LAN), metropolitan area network (MAN), wide area network (WAN), a public data network (e.g., the Internet), short range wireless network, or any other suitable packet-switched network, such as a commercially owned, proprietary packet-switched network, e.g., a proprietary cable or fiber-optic network, and the like, or any combination thereof. In addition, the wireless network may be, for example, a cellular network and may employ various technologies including enhanced data rates for global evolution (EDGE), general packet radio service (GPRS), global system for mobile communications (GSM), Internet protocol multimedia subsystem (IMS), universal mobile telecommunications system (UMTS), etc., as well as any other suitable wireless medium, e.g., worldwide interoperability for microwave access (WiMAX), Long Term Evolution (LTE) networks, code division multiple access (CDMA), wideband code division multiple access (WCDMA), wireless fidelity (WiFi), wireless LAN (WLAN), Bluetooth®, Internet Protocol (IP) data casting, satellite, mobile ad-hoc network (MANET), and the like, or any combination thereof.

[0061] The UE 101 is any type of mobile terminal, fixed terminal, or portable terminal including a navigation unit (e.g., in-vehicle or standalone), a mobile handset, station, unit, device, multimedia computer, multimedia tablet, Internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system (PCS) device, personal navigation device, personal digital assistants (PDAs), audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, game device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that the UE 101 can support any type of interface to the user (such as "wearable" circuitry, etc.). [0062] By way of example, the UE 101, the audio/visual conversion platform 103, and the audio/visual conversion application 107 communicate with each other and other components of the communication network 105 using well known, new or still developing protocols. In this context, a protocol includes a set of rules defining how the network nodes within the communication network 105 interact with each other based on information sent over the communication links. The protocols are effective at different layers of operation within each node, from generating and receiving physical signals of various types, to selecting a link for transferring those signals, to the format of information indicated by those signals, to identifying which software application executing on a computer system sends or receives the information. The conceptually different layers of protocols for exchanging information over a network are described in the Open Systems Interconnection (OSI) Reference Model.

[0063] Communications between the network nodes are typically effected by exchanging discrete packets of data. Each packet typically comprises (1) header information associated with a particular protocol, and (2) payload information that follows the header information and contains information that may be processed independently of that particular protocol. In some protocols, the packet includes (3) trailer information following the payload and indicating the end of the payload information. The header includes information such as the source of the packet, its destination, the length of the payload, and other properties used by the protocol. Often, the data in the payload for the particular protocol includes a header and payload for a different protocol associated with a different, higher layer of the OSI Reference Model. The header for a particular protocol typically indicates a type for the next protocol contained in its payload. The higher layer protocol is said to be encapsulated in the lower layer protocol. The headers included in a packet traversing multiple heterogeneous networks, such as the Internet, typically include a physical (layer 1) header, a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header, and various application (layer 5, layer 6 and layer 7) headers as defined by the OSI Reference Model.

[0064] In one embodiment, the audio/visual conversion application 107 and the audio/visual conversion platform 103 interact according to a client-server model. It is noted that the client- server model of computer process interaction is widely known and used. According to the client- server model, a client process sends a message including a request to a server process, and the server process responds by providing a service. The server process may also return a message with a response to the client process. Often the client process and server process execute on different computer devices, called hosts, and communicate via a network using one or more protocols for network communications. The term "server" is conventionally used to refer to the process that provides the service, or the host computer on which the process operates. Similarly, the term "client" is conventionally used to refer to the process that makes the request, or the host computer on which the process operates. As used herein, the terms "client" and "server" refer to the processes, rather than the host computers, unless otherwise clear from the context. In addition, the process performed by a server can be broken up to run as multiple processes on multiple hosts (sometimes called tiers) for reasons that include reliability, scalability, and redundancy, among others.

[0065] FIGs. 2-4C illustrate an example of converting audio data (e.g., in the form of sheet music in standard notation) into a visual representation, according to one embodiment. More specifically, FIG. 2 is a diagram of a piano keyboard used in the various processes described herein. FIGs. 3A and 3B are diagrams of example audio data provided a musical score 301a of FIG. 3A and a musical score 301b of FIG. 3B (also referred to collectively as musical score 301) in standard notation for conversion. FIGs. 4A-4C are diagrams of example visual representations of the audio data depicted in FIGs. 3A and 3B.

[0066] In one embodiment, the conversion process of the system 100 is based on extraction an audio sequence and related characteristics from audio data. By way of example, audio data contains many characteristics, including pitch, tone, rhythm, melody and time value or duration. Generally, melodies are the basic element of music, based on a certain style and tempo to create a tune, according to pitch and time signature. On a basic staff there are seven sound levels: C, D, E, F, G, A and B, and when sung they are represented as do, re, mi, fa, so, la, ti. Each octave recycles these seven tones, which can be raised or lowered to create a richer sound, but maintaining the basic tonal value of the original.

[0067] The basic tone can be raised by a semitone "up", indicated by the "#" symbol, or reduced by a semitone "down", represented by "b". Therefore there are an additional five semitones to be included in the octave: C# (Db), D# (Eb), F# (Gb), G# (Ab) and A# (Bb). Seven basic tones plus these five semitones adds up to a total of 12 sounds in the octave. This is called the "12 Equal Tone Temperament" and the piano keyboard is based on this law, with each octave of keys (5 black and 7 white) adding up to a total of 12 keys. On the piano, this apparent sequence of tonal sounds is used. Today the largest pianos have eighty-eight different keys, and tones other than these are rarely used.

[0068] Referring to FIG. 2, a schematic view of the piano keyboard 201 is illustrated as a representation of the method.

[0069] Fifty two white keys on the keyboard 201 are repeated every octave, giving seven basic tones. The adjacent octaves above and below share the tone names (or musical notes).

[0070] The zones can be divided into three treble, alto and bass areas. Each zone consists of seven basic tones and five semitones, comprising a total of 12 tones: C, C# (Db), D, D# (Eb),E, F, F# (Gb), G, G# (Ab), A, A# (Bb) and B. According to this sequence, an Arabic numeral can be allocated to each tone, creating a numerical sequence 1,2,3,4,5,6,7,8,9, 10,11, 12, in essence a music alphabet (see FIG. 2).

[0071] Therefore the three zones (bass, alto and treble) can each be encoded as: 1, 2, 3, 4, 5,

6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36.

[0072] As shown in FIG. 2, the lower or higher a tone is in relation to the treble zone the number can be moved a column up or down in accordance with the code. Therefore you can acquire an audio sequence (tonal sequence, e.g., a sequence of the audio frequencies of the musical notes) from the audio data.

[0073] In one embodiment, the system 100 takes features or characteristics from the audio data (such as the note numbers - see FIG. 2) and assigns color values from the color set, for example the numerical value associated with each shade. In one embodiment, the system 100 also extracts a characteristic related to the duration of each note or tone from the audio data, thereby allowing the audio data to be expressed in painted form using color, pattern, or design of a visual element to represent a tone or musical note and a size of the visual element to represent a duration of the note. [0074] In addition, the system 100 enables eliciting of music from the color sets or other visual representations that encode note/tone and duration as described above by trying to match the color bars (or other color visual elements in the visual representation or image) to the corresponding notes and durations. This can be used to generate an audio expression that corresponds to the visual data (e.g., once the matches have been adjusted and filtered). In this way, a painting or other visual representation can be used as a means for composing music.

[0075] In one embodiment, color cards form components of a continuum of colors in a collection. It is contemplated that although the various embodiments are discussed with mapping tones or notes to color, these tones or notes can also be mapped to patterns or designs. In addition, color, patterns, and designs can be combined in any combination. For example, a huge range of colors, patterns, and designs are available, and the system 100 can choose any desired color and acquire a completely standardized and digitalized marker for the color. In one embodiment, such specific markers are important in order to generate a single tone as perfectly matched to it as possible. With these sorts of cards (e.g., taken from a selected color space), the system 100 can rely on the serialization of the color values by following the "12 Equal Tone Temperament". In one embodiment, sound databases can also be manufactured in this way, with the aid of the piano keyboard in FIG. 2 or other similar musical note correlation table, thereby enabling the system 100 to accurately match the visual values to audio ones.

[0076] By way of example, there are many types of color spaces that can be used by the system 100. Accordingly, the example of color spaces are discussed herein by way of illustration and not limitation. For example, the most commonly used in the USA is Pantone, in Germany the Raul color space, in Japan the DIC color space, in China the ISO and Chinese architectural color space, in Sweden the NCS (Natural Color System) color space, and in additional the American Munsell color space and American Rite digital color space. The Blum color space is less commonly used. Pantone also features a TPX/TCX special textile color space, and CSI is another color space used for printing on textiles.

[0077] Currently, the Pantone Color Card Matching System is a world-renowned system for printing and other disciplines requiring the efficient communication of color data. It has become the de facto international language for colors. Customers can use the Pantone color space for graphic design, textiles, furniture, color management, outdoor architecture, interior design and other fields. As a globally recognized and leading service provider, Pantone Color Institute has also become an important resource for the world's most influential media.

[0078] In Germany and the rest of Europe the standard color space is a German brand, which is also respected around the world, known as the RAL international color space. Since 1927, when RAL was founded, it has created a common language for naming and establishing standards for colors, and these are now internationally understood and applied. RAL uses 4-digit numbers to refer to shades for over 70 years, reaching more than 200 different kinds, for example using RAL-840HR to refer to a matte color space, and 841GL for the glossy color space. These basic color swatches have a wide range of applications and have been used by a number of important companies and research institutes. RAL 840HR and RAL 841GL are used to design samples with certain colors, but they also offer security and reference numbers corresponding with the DIN and ISO systems of color measurement.

[0079] The RAL system has been developed for professional color design, particularly in the construction industry. It contains a regular index of 1688 colors, with each color having a 7-digit number assigned to it. The index is not in any particular order, but the 7-digit number takes into account color, brightness, saturation and HLC technical measurements. Since the number is based on the standard international CIE coefficient between wavelength and perceived color, it is an outstanding tool for helping designers and other persons concerned with color. With it color coordination becomes very easy.

[0080] The Natural Color System (NCS) color space, is the color space tool used in Sweden, which uses how eyes perceive color to classify them. Perceived color is defined and given a number by the NCS. These cards also take into account properties such as saturations, brightness and huge. The NCS describes the visual properties regardless of the pigment formulations and optical parameters.

[0081] Other color spaces include the Oswald international color system and the Munsell color system, and color spaces have also been drawn from various design software such as Adobe Photoshop and Adobe Illustrate. [0082] In one embodiment, the system 100 can use a variety of color spaces as resources to assign values and notes to specific colors, where you pick a color sequence that features the color card that corresponds with the value of each tone. There is a coding sequence that identifies the value of each color on the card, which then corresponds to a tone on a musical continuum and therefore a sound. This coding sequence is called the color-tone matching system.

[0083] In addition, there are a variety of ways to transform audio data into visual data, using different elements for the transformation and obtaining different visual effects. In one embodiment, shifting between color cards from the color space will result in different color sequences. In one embodiment, the algorithm used are arranged with certain rules for certain features combined together, including hue, lightness, purity, complementation, and so on. These characteristics can ensure that resulting visual representations such as paintings have a strong sense of brightness, and are rich in color that is both modern and decorative, even giving the illusory sense of space.

[0084] In another embodiment, the system 100 can change how the color values are sequenced, for example having the sequence range from cold to warm, or warm to cold in a gradient. In order to make the picture colorful, they can be arranged in a color wheel, for example with the rich colors representing the Earth's equatorial regions, the light whites and grays like the northern hemisphere, and dark grays and blacks like the southern latitudes.

[0085] In one embodiment, the system 100 can also arrange the values by brightness, for example from shallow to deep, or dark to light, again in the form of a gradient. Generally, this works best as part of a monochrome palette, and if too many shades are involved then the effort becomes a little disordered and can become counterproductive.

[0086] In one embodiment, the system 100 use the color's purity to rearrange the sequence of values. For example, the sequence of colors bright to dark within the shade of "gray" may lead to new permutations when the interim shades are considered. As another example, a color wheel can feature opposing pairs of contrasting colors that can be combined to fill the color cells. In one embodiment, these combinations can be shifted 180° around the circle to create numerous new variations. [0087] In yet another embodiment, the system 100 can transform the sequence using a range of different factors including, but not limited to, the hue, lightness and purity, to provide a more holistic arrangement. This more complicated gradation will combine, for instance, these three factors to create a more complex and richer product.

[0088] In one embodiment, color cards are assigned to each color value in the sequence, which in turn matches the musical tones in a continuous sequence. A visual and intuitive form of painting can be expressed through hearing music, via data conversion between image and sound.

[0089] As previously described, in one embodiment, the conversion between audio and visual data using this method consists of the following steps: first manually encoding the tonal sequences in the music, and then picking a color sequence from the color space and determine color values to correspond with the tonal sequence. Thus the audio and visual sequences will be matched (e.g., a previously determined mapping or correlation), with each note value being matched to a color chosen from the selected color sequence. For example, the system 100 can match the color block using a library of sound. First, the system 100 receives or determines the tonal sequences associated with the audio data, and then can choose an appropriate color range from the color space. The system 100 will logically match the tonal sequence to the color sequence.

[0090] In one embodiment, the system 100 can use data related to the progression of the melody, note values, sound volume and other factors to derive characteristics that can have color values, such as a color card, assigned to them. In one embodiment, the system 100 can generate the visual representation or image of the audio data in any medium, be it watercolor, gouache, acrylic, or oil paints.

[0091] For example, the midrange sequence of basic tones C, C# (Db), D, D# (Eb), E, F, F# (Gb), G, G, (Ab), A, A# (Bb), B, are matched to a digitally encoded sequence of 13, 14,15, 16,17, 18,19,20,21,22,23,24, and uses a gradient of yellow fading to purple to demonstrate this.

[0092] For example, the Pantone color space from the US has a gradient from yellow to red, and then to purple. Both the forward count and backward count of each color value chosen from the color space can be matched with the midrange note sequence. The color-coded values can also be used in reverse order, in order to match the sequence of alto tones that have been encoded. The alto tones used upstream in one numerical order can also be used downstream in another order to correspond with different color values.

[0093] By way of example, the following Table 1 shows how timbre is encoded to correspond with certain color cards.

[0094] In this example, the above sequence uses the yellow to purple gradient from the Pantone color space to match the tempered alto tones, and such a method can also be reversed. The idea is to follow a continuous logic when encoding such data digitally.

[0095] In selecting a color value based on the high and low ranges in music, the system 100 can increase or decrease the brightness or intensity of the color in proportion to this. As illustrated above from C13 to B, the color value varies from purple (dark) to yellow (light) in a gradient, and vice versa.

[0096] In one embodiment, the system 100 considers a time value or duration of a tone or note when generating a visual representation of audio data. By way of example, common duration for notes include semibreve, minims, crotchets, quavers, semi-quaver, etc. In many cases, different notes have different time durations. For example, time duration in the score is used to express the relative duration between each bar. Time duration also determines how long a note lasts. Music is known as the art of time because of its inextricable connection to the movement of time.

[0097] Therefore, in one embodiment, the system 100 can match the time duration of a note to the cell area (e.g., the visual element representing a note in a visual representation) and make the associated cell area proportional to the duration. For example, if the system 100 sets the time duration of a crotchet equal to the area of a unit cell. Then in one bar, a crotchet takes up one unit cell; a minim takes up two unit cell; a semibreve takes up four unit cell; a semi-quaver takes up half unit cell. In comparison, if the system 100 sets the time duration of a minim as the area of a unit cell, then a crotchet takes up half a unit cell; a minim takes up one unit cell; a semibreve takes up two unit cells; a quaver takes up one fourth of a unit cell. If the system 100 equals the duration of a semibreve to the area of a unit cell, a crochet takes up one fourth of a unit cell; minims takes up half of a unit cell; a semibreve takes up one unit cell; quavers takes up one eighth of a unit cell. And so forth, the area each specific note takes up is proportional to the duration of a note in accordance with the preset unit cell area.

[0098] Therefore, the time duration of the audio data corresponds to the area it takes up based on the preset unit area.

[0099] In one embodiment, the system 100 recognizes the gradual change between the colors in a continuous color sequence taken from a color card, and the logical order of scale sequence. It links up the two sequences. In this way, a visual representation or image consisting of continuous color bars cards (or other visual elements) can be generated. Multiple images can be generated based on the same tonal sequence by varying the choice of the color sequence. Unlike the previous experimental cross-border attempts to combine music and art, this application allows viewers to recognize the well-known songs hidden behind the painting by following the color bar coding. The resulting image is a colored expression of musical score with color sequences taken from various color spaces that encode both tone and duration so that the original music or audio data can be recreated from the image.

[00100] FIGs. 3A and 3B illustrate an example musical score 301 (e.g., "the Star Spangled Banner") that represents an example set of audio data that can be converted. In this example, the musical performance represented in this audio data will be converted into visual form with notes occupying individual cells of color (e.g., individual visual elements that take the form of cells or bars). Although this example provides a musical score 301 that includes two parts (e.g., music to be played by the left hand and music to be played by the right hand), it is contemplated that music with only one part or more than two parts may also be used. In one embodiment, the system 100 can detect the presence of multi-part music and then present a prompt to a user to select one or more parts for conversion. If more than one part is selected, multiple images can be created or overlapped into a single image based on, for instance, how a user configures the system 100.

[00101] It also is noted that although various embodiments describe visual elements that are cells, it is contemplated that the visual element can be any shape or figure. On a basic level, each tone from the music is converted into a color for the image. Each cell or visual element is filled with a color, pattern, and/or design in a manner that is consistent with each tone to generate the image, and the duration of each tone or musical note is encoded in the size or area of the cell.

[00102] In this example, the arrangements of values corresponding to the features (e.g., tone/note and duration) in the music will be arranged from low to high. From this sequence, the system 100 determines the appropriate color card, which will be the one that corresponds to that particular sound. It should be noted that not all tones will appear in one appear in one piece of audio data, but nevertheless these tones should also be assigned with their respective color values in order to keep the mapping consistent and coherent.

[00103] Within the range to choose from, the music's features will dictate the distribution of certain colors, as well as patterns such as a fade from dark to light or strong to pale colors, for example. The example below is "The Spangled Banner" as shown in FIGs. 3A and 3B, where notes numbered from 37 to 56 have been associated with certain color cards, and the color gradually deepens from #000000 to #0000FF. Where there are gaps in the music a special color, such as gold, might be used to give additional labelling for instruction.

[00104] The following Table 2 shows the song "The Star Spangled Banner" as an example of how tones encode into note values, and their correspondence with the Color space's color cards.

Note Time Color

[00105] Finally, the system 100 can fill the cell with the color specified by the corresponding audio sample. In this way, the law of musical melody can be transferred to a color gradient. In one embodiment, the system 100 can be configured to use or provide instructions for using painting materials, for example acrylic or watercolors, in order to fill the cells in the visual representation.

[00106] FIG. 4A depicts one visual representation of the music using one color sequence while FIG. 4B depicts a visual representation of musing using another color sequence to represent the same audio data. For example, the visual representation 401a of FIG. 4A is based on a color sequence mapped to specific note values as shown in the legend 403a of FIG. 4A. In contrast, the visual representation 401b of FIG. 4B (which depicts the same audio data as the visual representation 401a) is mapped to a different color sequence as shown in the legend 403b of FIG. 4B. Although FIGs. 4 A and 4B may not be reproduced in the color, the color values listed in the respective legends 401a and 401b map different color values to the individual note values which are visually represented in different shading.

[00107] In addition, although the examples of FIGs. 4A and 4B shows that the visual representation comprising rectangular-shaped (parallelogram-shaped) cells or visual elements, it should be noted that it can also be divided into different cell shapes, such as circles, trapezoids, triangles, diamonds, hexagons, crescents, and/or any other shapes, in order to enhance and enrich the picture. In one embodiment, regardless of shape, the fill color will correspond with the color value, and the cell areas or sizes will correspond to the durations of the notes.

[00108] FIG. 4C depicts a visual representation 421 that uses patterns or designs in addition to color to represent the each tone or note. As shown in FIG. 4C, a legend 423 is included in the visual representation 421 to indicate which colors, patterns, and/or designs correspond to which tones or notes.

[00109] FIG. 5 is flowchart of a process for converting audio data into a visual representation, according to one embodiment. In one embodiment, the audio/visual conversion platform 103 performs the process 500 and is implemented in, for instance, a chip set including a processor and a memory as shown in FIG. 11. In addition or alternatively, the audio/visual conversion application may perform all or a portion of the process 500.

[00110] In step 501, the platform 103 determines one or more characteristic values from audio data, wherein the one or more characteristic values include one or more musical notes and respective durations for the one or more musical notes. In one embodiment, the one or more characteristics or features is selected from audio data. By way of example, the audio data may be songs, music, drama or fragments of some larger work. In addition, the audio data may be in the form of a musical score (e.g., sheet music) from a song book or as audible data sampled by the platform 103. In one embodiment, the characteristics chosen may include notes, pitch, duration, rhythm, melody and time values, to name just a few. For example, the platform 103 can parse an orchestral score (e.g., by performing image recognition of the staff notation provided in the score - see FIGs. 3A and 3B for an example) to measure the duration of each note as well as its pitch.

[00111] In step 503, the platform 103 maps the one or more musical notes to a first visual characteristic of one or more visual elements, wherein the first visual characteristic includes a color, a pattern, a design, or a combination thereof of the one or more visual elements. In one embodiment, the one or more visual elements is a geometric shape. In one embodiment, the geometric shape includes a parallelogram. In one embodiment, the values elicited from the audio data is mapped with a certain range of colors. These may be specified as a range of colors, bars or color cards. These cards may have a logical organization, such as the progression from pale colors to much darker ones. Color spaces including examples such as Pantone, the German Raul color space, the Japanese DIC color space, Chinese ISO and building CCD and Swedish NCS (Natural Color System), can be used to facilitate this. Determining exactly how each mapped value might correspond to each color might require further analysis of the audio characteristics, in order to sequence the colors in a logical sequence

[00112] In step 505, the platform 103 maps the respective durations to a second visual characteristic of the one or more visual elements, wherein the second visual characteristic includes a size of the one or more visual elements.

[00113] In step 507, the platform 103 generates a visual representation of the audio data by rendering the one or more visual elements using the mapped first visual characteristic and the mapped second visual characteristic.

[00114] In one embodiment, the platform 103 can determine whether additional metadata other that simply the audio itself is available. For example, metadata can be used to determine the composer, style, theme, etc. associated with the audio data. In one embodiment, the metadata itself may specify a color sequence or space to use. The platform 103 can then determine whether the metadata is associated with a certain colors, and can then further generate or otherwise adjust the image or visual representation based on this metadata.

[00115] FIG. 6 is a flowchart of a process for determining and audio sequence and selecting a color space for a visual representation of audio data, according to one embodiment. In one embodiment, the audio/visual conversion platform 103 performs the process 600 and is implemented in, for instance, a chip set including a processor and a memory as shown in FIG. 11. In addition or alternatively, the audio/visual conversion application may perform all or a portion of the process 600.

[00116] In step 601, the platform 103 processes the audio data to determine an audio sequence of the one or more musical notes.

[00117] In step 603, the platform 103 determines one or more values for the first visual characteristic based on the audio sequence. [00118] In step 605, the platform 103 selects at least one color space for the visual representation. In one embodiment, the at least one color space is selected from at least one of a Pantone color space, a RAL color space, a DIC color space, an ISO color space, a Chinese building color space, an NCS color space, a Munsell color space, and a Rite Digital color space.

[00119] In step 607, the platform 103 maps a color sequence for the one or more values of the first visual characteristic based on the at least one color space. In one embodiment, a number of colors are selected from collection of colors used based on the mappings. For example, the system 100 can select the notes that are associated with certain aspects of the sound from the range of colors available. In one embodiment, aspects such as the brightness of sound would determine a stronger color, whilst a deeper sound might elicit something darker.

[00120] In one embodiment, the platform 103 can pick up different elements from the audio data to match the color sequence, so as to generate or adjust the image.

[00121] In one embodiment, the system 100 can also use other audio and/or visual cues to create a set of values that will adjust the finished image to relate even more closely with the original media. For example, the platform 103 can change color values in real time during a performance based on the singer's language, accent, dress, expressions and/or other factors. For example when the singer sings in American Engli sh, you can match with colors ciose y related to the US, such as the reel white and blue found i the Stars and Stripes. Upon detecting that the singer is wearing certain outdoor elements (such as wearing a wolfskin jacket - Jack Wolfskin - for outdoor climbing), it is possible to match colors with these elements, for example white for a snowy mountain, green for grasslands, blue for the ocean and so on . The system 100 can also detect the facial expression of the singer as well (e.g., through optical image recognition), and assig a color such as gray should the singer be, for example, frowning.

[00122] In this case, the application may also include positioning, testing and analyzing equipment, as well as sensing devices. These devices can detect and analyze location markers within the audio data, the quality and accent of the singer's voice, their clothing, facial expressions and other elements, or to adjust the image so that these factors can be included.

[00123] By adjusting the image you can change the color value as mentioned above, changing the color sequence and thereby the colors expressed in the image. [00124] In step 609, the platform 103 changes the color sequence for the one or more values of the first visual characteristic by shifting along the at least one color space.

[00125] FIG. 7 is a flowchart of a process for representing duration of musical notes in audio data using a size of a visual element, according to one embodiment. In one embodiment, the audio/visual conversion platform 103 performs the process 700 and is implemented in, for instance, a chip set including a processor and a memory as shown in FIG. 11. In addition or alternatively, the audio/visual conversion application may perform all or a portion of the process 700.

[00126] In step 701, the platform 103 specifies the size of the one or more visual elements to be proportional to the respective durations of the one or more musical notes.

[00127] In step 703, the platform 103 varies the size of the one or more visual elements to be proportional to the respective durations of the one or more musical notes relative to one axis of the one or more visual elements while maintaining a fixed size along another axis of the one or more visual elements.

[00128] FIG. 8 is a flowchart of a process for generating a legend of presentation in a visual representation of audio data, according to one embodiment. In one embodiment, the audio/visual conversion platform 103 performs the process 800 and is implemented in, for instance, a chip set including a processor and a memory as shown in FIG. 11. In addition or alternatively, the audio/visual conversion application may perform all or a portion of the process 800.

[00129] In step 801, the platform 103 generates a legend for correlating the one or more values for the first visual characteristic to the audio sequence, the one or more musical notes, or a combination thereof.

[00130] In step 803, the platform 103 presents the legend in the visual representation.

[00131] FIG. 9 is a flowchart of a process for converting a visual representation of audio data into an audio sequence for playback, according to one embodiment. In one embodiment, the audio/visual conversion platform 103 performs the process 900 and is implemented in, for instance, a chip set including a processor and a memory as shown in FIG. 11. In addition or alternatively, the audio/visual conversion application may perform all or a portion of the process 900.

[00132] In step 901, the platform 103 processes a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation, wherein the first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements, and wherein the second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements. In one embodiment, the one or more visual elements is a geometric shape. In one embodiment, the geometric shape includes a parallelogram.

[00133] In one embodiment, the characteristic values obtained from the image or visual representation may include color shades, ranges, hue, brightness, contrast and purity

[00134] In one embodiment, the platform 103 determines a mapping of the color, a pattern, a design, or a combination thereof to a tonal range, wherein the mapping was used to generate the visual representation. The platform 103 then extracts the one or more musical notes from the visual representation based on the mapping.

[00135] In one embodiment, the platform 103 determines at least one color space associated with the visual representation. In one embodiment, the extracting of the one or more musical notes from the visual representation is further based on the at least one color space. In one embodiment, the at least one color space is selected from at least one of a Pantone color space, a RAL color space, a DIC color space, an ISO color space, a Chinese building color space, an NCS color space, a Munsell color space, and a Rite Digital color space.

[00136] In one embodiment, the platform 103 processes the visual representation to extract a legend for correlating the at least one color space to the tonal range or to the one or more musical notes. In one embodiment, the mapping of the color space to the tonal range is based on the extracted legend.

[00137] In one embodiment, the platform 103 processes the size of the one or more visual elements to determine the respective durations of the one or more musical notes. In one embodiment, the respective durations of the one or more musical notes is encoded with respect to at least one axis of the one or more visual elements. In one embodiment, the size of the one or more visual elements is proportional to the respective durations of the one or more musical notes corresponding to the one or more visual elements

[00138] In step 903, the platform 103 generates an audio sequence based on the one or more musical notes and the respective durations. In one embodiment, the image data or visual representation may also be generated or adjusted in relation to other data related to the image in order to make more harmonious tones. For example the color bars, color patterns, consistency of painting and colors selected need to be associated with a pattern of notes, where the composer can then refine them. In addition, the processes described above for using metadata to generate or adjust the visual representation may also be used for generating audio data from images.

[00139] In step 905, the platform 103 initiates a playback of the audio sequence.

[00140] The processes described herein for converting audio data into a visual representation may be advantageously implemented via software, hardware, firmware or a combination of software and/or firmware and/or hardware. For example, the processes described herein, may be advantageously implemented via processor(s), Digital Signal Processing (DSP) chip, an Application Specific Integrated Circuit (ASIC), Field Programmable Gate Arrays (FPGAs), etc. Such exemplary hardware for performing the described functions is detailed below.

[00141] FIG. 10 illustrates a computer system 1000 upon which an embodiment of the invention may be implemented. Although computer system 1000 is depicted with respect to a particular device or equipment, it is contemplated that other devices or equipment (e.g., network elements, servers, etc.) within FIG. 10 can deploy the illustrated hardware and components of system 1000. Computer system 1000 is programmed (e.g., via computer program code or instructions) to convert audio data into a visual representation as described herein and includes a communication mechanism such as a bus 1010 for passing information between other internal and external components of the computer system 1000. Information (also called data) is represented as a physical expression of a measurable phenomenon, typically electric voltages, but including, in other embodiments, such phenomena as magnetic, electromagnetic, pressure, chemical, biological, molecular, atomic, sub-atomic and quantum interactions. For example, north and south magnetic fields, or a zero and non-zero electric voltage, represent two states (0, 1) of a binary digit (bit). Other phenomena can represent digits of a higher base. A superposition of multiple simultaneous quantum states before measurement represents a quantum bit (qubit). A sequence of one or more digits constitutes digital data that is used to represent a number or code for a character. In some embodiments, information called analog data is represented by a near continuum of measurable values within a particular range. Computer system 1000, or a portion thereof, constitutes a means for performing one or more steps of converting audio data into a visual representation.

[00142] A bus 1010 includes one or more parallel conductors of information so that information is transferred quickly among devices coupled to the bus 1010. One or more processors 1002 for processing information are coupled with the bus 1010.

[00143] A processor (or multiple processors) 1002 performs a set of operations on information as specified by computer program code related to converting audio data into a visual representation. The computer program code is a set of instructions or statements providing instructions for the operation of the processor and/or the computer system to perform specified functions. The code, for example, may be written in a computer programming language that is compiled into a native instruction set of the processor. The code may also be written directly using the native instruction set (e.g., machine language). The set of operations include bringing information in from the bus 1010 and placing information on the bus 1010. The set of operations also typically include comparing two or more units of information, shifting positions of units of information, and combining two or more units of information, such as by addition or multiplication or logical operations like OR, exclusive OR (XOR), and AND. Each operation of the set of operations that can be performed by the processor is represented to the processor by information called instructions, such as an operation code of one or more digits. A sequence of operations to be executed by the processor 1002, such as a sequence of operation codes, constitute processor instructions, also called computer system instructions or, simply, computer instructions. Processors may be implemented as mechanical, electrical, magnetic, optical, chemical or quantum components, among others, alone or in combination.

[00144] Computer system 1000 also includes a memory 1004 coupled to bus 1010. The memory 1004, such as a random access memory (RAM) or any other dynamic storage device, stores information including processor instructions for converting audio data into a visual representation. Dynamic memory allows information stored therein to be changed by the computer system 1000. RAM allows a unit of information stored at a location called a memory address to be stored and retrieved independently of information at neighboring addresses. The memory 1004 is also used by the processor 1002 to store temporary values during execution of processor instructions. The computer system 1000 also includes a read only memory (ROM) 1006 or any other static storage device coupled to the bus 1010 for storing static information, including instructions, that is not changed by the computer system 1000. Some memory is composed of volatile storage that loses the information stored thereon when power is lost. Also coupled to bus 1010 is a non-volatile (persistent) storage device 1008, such as a magnetic disk, optical disk or flash card, for storing information, including instructions, that persists even when the computer system 1000 is turned off or otherwise loses power.

[00145] Information, including instructions for converting audio data into a visual representation is provided to the bus 1010 for use by the processor from an external input device 1012, such as a keyboard containing alphanumeric keys operated by a human user, or a sensor. A sensor detects conditions in its vicinity and transforms those detections into physical expression compatible with the measurable phenomenon used to represent information in computer system 1000. Other external devices coupled to bus 1010, used primarily for interacting with humans, include a display device 1014, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a light emitting diode (LED) display, an organic LED (OLED) display, a plasma screen, or a printer for presenting text or images, and a pointing device 1016, such as a mouse, a trackball, cursor direction keys, or a motion sensor, for controlling a position of a small cursor image presented on the display 1014 and issuing commands associated with graphical elements presented on the display 1014. In some embodiments, for example, in embodiments in which the computer system 1000 performs all functions automatically without human input, one or more of external input device 1012, display device 1014 and pointing device 1016 is omitted.

[00146] In the illustrated embodiment, special purpose hardware, such as an application specific integrated circuit (ASIC) 1020, is coupled to bus 1010. The special purpose hardware is configured to perform operations not performed by processor 1002 quickly enough for special purposes. Examples of ASICs include graphics accelerator cards for generating images for display 1014, cryptographic boards for encrypting and decrypting messages sent over a network, speech recognition, and interfaces to special external devices, such as robotic arms and medical scanning equipment that repeatedly perform some complex sequence of operations that are more efficiently implemented in hardware.

[00147] Computer system 1000 also includes one or more instances of a communications interface 1070 coupled to bus 1010. Communication interface 1070 provides a one-way or two- way communication coupling to a variety of external devices that operate with their own processors, such as printers, scanners and external disks. In general the coupling is with a network link 1078 that is connected to a local network 1080 to which a variety of external devices with their own processors are connected. For example, communication interface 1070 may be a parallel port or a serial port or a universal serial bus (USB) port on a personal computer. In some embodiments, communications interface 1070 is an integrated services digital network (ISDN) card or a digital subscriber line (DSL) card or a telephone modem that provides an information communication connection to a corresponding type of telephone line. In some embodiments, a communication interface 1070 is a cable modem that converts signals on bus 1010 into signals for a communication connection over a coaxial cable or into optical signals for a communication connection over a fiber optic cable. As another example, communications interface 1070 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN, such as Ethernet. Wireless links may also be implemented. For wireless links, the communications interface 1070 sends or receives or both sends and receives electrical, acoustic or electromagnetic signals, including infrared and optical signals, that carry information streams, such as digital data. For example, in wireless handheld devices, such as mobile telephones like cell phones, the communications interface 1070 includes a radio band electromagnetic transmitter and receiver called a radio transceiver. In certain embodiments, the communications interface 1070 enables connection to the communication network 105 for converting audio data into a visual representation.

[00148] The term "computer-readable medium" as used herein refers to any medium that participates in providing information to processor 1002, including instructions for execution. Such a medium may take many forms, including, but not limited to computer-readable storage medium (e.g., non-volatile media, volatile media), and transmission media. Non-transitory media, such as non-volatile media, include, for example, optical or magnetic disks, such as storage device 1008. Volatile media include, for example, dynamic memory 1004. Transmission media include, for example, twisted pair cables, coaxial cables, copper wire, fiber optic cables, and carrier waves that travel through space without wires or cables, such as acoustic waves and electromagnetic waves, including radio, optical and infrared waves. Signals include man-made transient variations in amplitude, frequency, phase, polarization or other physical properties transmitted through the transmission media. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, CDRW, DVD, any other optical medium, punch cards, paper tape, optical mark sheets, any other physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, an EPROM, a FLASH-EPROM, an EEPROM, a flash memory, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read. The term computer-readable storage medium is used herein to refer to any computer-readable medium except transmission media.

[00149] Logic encoded in one or more tangible media includes one or both of processor instructions on a computer-readable storage media and special purpose hardware, such as ASIC 1020.

[00150] Network link 1078 typically provides information communication using transmission media through one or more networks to other devices that use or process the information. For example, network link 1078 may provide a connection through local network 1080 to a host computer 1082 or to equipment 1084 operated by an Internet Service Provider (ISP). ISP equipment 1084 in turn provides data communication services through the public, world-wide packet-switching communication network of networks now commonly referred to as the Internet 1090.

[00151] A computer called a server host 1092 connected to the Internet hosts a process that provides a service in response to information received over the Internet. For example, server host 1092 hosts a process that provides information representing video data for presentation at display 1014. It is contemplated that the components of system 1000 can be deployed in various configurations within other computer systems, e.g., host 1082 and server 1092.

[00152] At least some embodiments of the invention are related to the use of computer system 1000 for implementing some or all of the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 1000 in response to processor 1002 executing one or more sequences of one or more processor instructions contained in memory 1004. Such instructions, also called computer instructions, software and program code, may be read into memory 1004 from another computer-readable medium such as storage device 1008 or network link 1078. Execution of the sequences of instructions contained in memory 1004 causes processor 1002 to perform one or more of the method steps described herein. In alternative embodiments, hardware, such as ASIC 1020, may be used in place of or in combination with software to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware and software, unless otherwise explicitly stated herein.

[00153] The signals transmitted over network link 1078 and other networks through communications interface 1070, carry information to and from computer system 1000. Computer system 1000 can send and receive information, including program code, through the networks 1080, 1090 among others, through network link 1078 and communications interface 1070. In an example using the Internet 1090, a server host 1092 transmits program code for a particular application, requested by a message sent from computer 1000, through Internet 1090, ISP equipment 1084, local network 1080 and communications interface 1070. The received code may be executed by processor 1002 as it is received, or may be stored in memory 1004 or in storage device 1008 or any other non-volatile storage for later execution, or both. In this manner, computer system 1000 may obtain application program code in the form of signals on a carrier wave.

[00154] Various forms of computer readable media may be involved in carrying one or more sequence of instructions or data or both to processor 1002 for execution. For example, instructions and data may initially be carried on a magnetic disk of a remote computer such as host 1082. The remote computer loads the instructions and data into its dynamic memory and sends the instructions and data over a telephone line using a modem. A modem local to the computer system 1000 receives the instructions and data on a telephone line and uses an infrared transmitter to convert the instructions and data to a signal on an infra-red carrier wave serving as the network link 1078. An infrared detector serving as communications interface 1070 receives the instructions and data carried in the infrared signal and places information representing the instructions and data onto bus 1010. Bus 1010 carries the information to memory 1004 from which processor 1002 retrieves and executes the instructions using some of the data sent with the instructions. The instructions and data received in memory 1004 may optionally be stored on storage device 1008, either before or after execution by the processor 1002.

[00155] FIG. 11 illustrates a chip set or chip 1100 upon which an embodiment of the invention may be implemented. Chip set 1100 is programmed to convert audio data into a visual representation as described herein and includes, for instance, the processor and memory components described with respect to FIG. 10 incorporated in one or more physical packages (e.g., chips). By way of example, a physical package includes an arrangement of one or more materials, components, and/or wires on a structural assembly (e.g., a baseboard) to provide one or more characteristics such as physical strength, conservation of size, and/or limitation of electrical interaction. It is contemplated that in certain embodiments the chip set 1100 can be implemented in a single chip. It is further contemplated that in certain embodiments the chip set or chip 1100 can be implemented as a single "system on a chip." It is further contemplated that in certain embodiments a separate ASIC would not be used, for example, and that all relevant functions as disclosed herein would be performed by a processor or processors. Chip set or chip 1100, or a portion thereof, constitutes a means for performing one or more steps of providing user interface navigation information associated with the availability of functions. Chip set or chip 1100, or a portion thereof, constitutes a means for performing one or more steps of converting audio data into a visual representation.

[00156] In one embodiment, the chip set or chip 1100 includes a communication mechanism such as a bus 1101 for passing information among the components of the chip set 1100. A processor 1103 has connectivity to the bus 1101 to execute instructions and process information stored in, for example, a memory 1105. The processor 1103 may include one or more processing cores with each core configured to perform independently. A multi-core processor enables multiprocessing within a single physical package. Examples of a multi-core processor include two, four, eight, or greater numbers of processing cores. Alternatively or in addition, the processor 1103 may include one or more microprocessors configured in tandem via the bus 1101 to enable independent execution of instructions, pipelining, and multithreading. The processor 1103 may also be accompanied with one or more specialized components to perform certain processing functions and tasks such as one or more digital signal processors (DSP) 1107, or one or more application-specific integrated circuits (ASIC) 1109. A DSP 1107 typically is configured to process real-world signals (e.g., sound) in real time independently of the processor 1103. Similarly, an ASIC 1109 can be configured to performed specialized functions not easily performed by a more general purpose processor. Other specialized components to aid in performing the inventive functions described herein may include one or more field programmable gate arrays (FPGA) (not shown), one or more controllers (not shown), or one or more other special-purpose computer chips.

[00157] In one embodiment, the chip set or chip 1100 includes merely one or more processors and some software and/or firmware supporting and/or relating to and/or for the one or more processors.

[00158] The processor 1103 and accompanying components have connectivity to the memory 1105 via the bus 1101. The memory 1105 includes both dynamic memory (e.g., RAM, magnetic disk, writable optical disk, etc.) and static memory (e.g., ROM, CD-ROM, etc.) for storing executable instructions that when executed perform the inventive steps described herein to convert audio data into a visual representation. The memory 1105 also stores the data associated with or generated by the execution of the inventive steps.

[00159] FIG. 12 is a diagram of exemplary components of a mobile terminal (e.g., handset) for communications, which is capable of operating in the system of FIG. 1, according to one embodiment. In some embodiments, mobile terminal 1201, or a portion thereof, constitutes a means for performing one or more steps of converting audio data into a visual representation. Generally, a radio receiver is often defined in terms of front-end and back-end characteristics. The front-end of the receiver encompasses all of the Radio Frequency (RF) circuitry whereas the back-end encompasses all of the base-band processing circuitry. As used in this application, the term "circuitry" refers to both: (1) hardware-only implementations (such as implementations in only analog and/or digital circuitry), and (2) to combinations of circuitry and software (and/or firmware) (such as, if applicable to the particular context, to a combination of processor(s), including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions). This definition of "circuitry" applies to all uses of this term in this application, including in any claims. As a further example, as used in this application and if applicable to the particular context, the term "circuitry" would also cover an implementation of merely a processor (or multiple processors) and its (or their) accompanying software/or firmware. The term "circuitry" would also cover if applicable to the particular context, for example, a baseband integrated circuit or applications processor integrated circuit in a mobile phone or a similar integrated circuit in a cellular network device or other network devices.

[00160] Pertinent internal components of the telephone include a Main Control Unit (MCU) 1203, a Digital Signal Processor (DSP) 1205, and a receiver/transmitter unit including a microphone gain control unit and a speaker gain control unit. A main display unit 1207 provides a display to the user in support of various applications and mobile terminal functions that perform or support the steps of converting audio data into a visual representation. The display 1207 includes display circuitry configured to display at least a portion of a user interface of the mobile terminal (e.g., mobile telephone). Additionally, the display 1207 and display circuitry are configured to facilitate user control of at least some functions of the mobile terminal. An audio function circuitry 1209 includes a microphone 1211 and microphone amplifier that amplifies the speech signal output from the microphone 1211. The amplified speech signal output from the microphone 1211 is fed to a coder/decoder (CODEC) 1213.

[00161] A radio section 1215 amplifies power and converts frequency in order to communicate with a base station, which is included in a mobile communication system, via antenna 1217. The power amplifier (PA) 1219 and the transmitter/modulation circuitry are operationally responsive to the MCU 1203, with an output from the PA 1219 coupled to the duplexer 1221 or circulator or antenna switch, as known in the art. The PA 1219 also couples to a battery interface and power control unit 1220. [00162] In use, a user of mobile terminal 1201 speaks into the microphone 1211 and his or her voice along with any detected background noise is converted into an analog voltage. The analog voltage is then converted into a digital signal through the Analog to Digital Converter (ADC) 1223. The control unit 1203 routes the digital signal into the DSP 1205 for processing therein, such as speech encoding, channel encoding, encrypting, and interleaving. In one embodiment, the processed voice signals are encoded, by units not separately shown, using a cellular transmission protocol such as enhanced data rates for global evolution (EDGE), general packet radio service (GPRS), global system for mobile communications (GSM), Internet protocol multimedia subsystem (IMS), universal mobile telecommunications system (UMTS), etc., as well as any other suitable wireless medium, e.g., microwave access (WiMAX), Long Term Evolution (LTE) networks, code division multiple access (CDMA), wideband code division multiple access (WCDMA), wireless fidelity (WiFi), satellite, and the like, or any combination thereof.

[00163] The encoded signals are then routed to an equalizer 1225 for compensation of any frequency-dependent impairments that occur during transmission though the air such as phase and amplitude distortion. After equalizing the bit stream, the modulator 1227 combines the signal with a RF signal generated in the RF interface 1229. The modulator 1227 generates a sine wave by way of frequency or phase modulation. In order to prepare the signal for transmission, an up-converter 1231 combines the sine wave output from the modulator 1227 with another sine wave generated by a synthesizer 1233 to achieve the desired frequency of transmission. The signal is then sent through a PA 1219 to increase the signal to an appropriate power level. In practical systems, the PA 1219 acts as a variable gain amplifier whose gain is controlled by the DSP 1205 from information received from a network base station. The signal is then filtered within the duplexer 1221 and optionally sent to an antenna coupler 1235 to match impedances to provide maximum power transfer. Finally, the signal is transmitted via antenna 1217 to a local base station. An automatic gain control (AGC) can be supplied to control the gain of the final stages of the receiver. The signals may be forwarded from there to a remote telephone which may be another cellular telephone, any other mobile phone or a land-line connected to a Public Switched Telephone Network (PSTN), or other telephony networks. [00164] Voice signals transmitted to the mobile terminal 1201 are received via antenna 1217 and immediately amplified by a low noise amplifier (LNA) 1237. A down-converter 1239 lowers the carrier frequency while the demodulator 1241 strips away the RF leaving only a digital bit stream. The signal then goes through the equalizer 1225 and is processed by the DSP 1205. A Digital to Analog Converter (DAC) 1243 converts the signal and the resulting output is transmitted to the user through the speaker 1245, all under control of a Main Control Unit (MCU) 1203 which can be implemented as a Central Processing Unit (CPU) (not shown).

[00165] The MCU 1203 receives various signals including input signals from the keyboard 1247. The keyboard 1247 and/or the MCU 1203 in combination with other user input components (e.g., the microphone 1211) comprise a user interface circuitry for managing user input. The MCU 1203 runs a user interface software to facilitate user control of at least some functions of the mobile terminal 1201 to convert audio data into a visual representation. The MCU 1203 also delivers a display command and a switch command to the display 1207 and to the speech output switching controller, respectively. Further, the MCU 1203 exchanges information with the DSP 1205 and can access an optionally incorporated SFM card 1249 and a memory 1251. In addition, the MCU 1203 executes various control functions required of the terminal. The DSP 1205 may, depending upon the implementation, perform any of a variety of conventional digital processing functions on the voice signals. Additionally, DSP 1205 determines the background noise level of the local environment from the signals detected by microphone 1211 and sets the gain of microphone 1211 to a level selected to compensate for the natural tendency of the user of the mobile terminal 1201.

[00166] The CODEC 1213 includes the ADC 1223 and DAC 1243. The memory 1251 stores various data including call incoming tone data and is capable of storing other data including music data received via, e.g., the global Internet. The software module could reside in RAM memory, flash memory, registers, or any other form of writable storage medium known in the art. The memory device 1251 may be, but not limited to, a single memory, CD, DVD, ROM, RAM, EEPROM, optical storage, magnetic disk storage, flash memory storage, or any other nonvolatile storage medium capable of storing digital data. [00167] An optionally incorporated SIM card 1249 carries, for instance, important information, such as the cellular phone number, the carrier supplying service, subscription details, and security information. The SEVI card 1249 serves primarily to identify the mobile terminal 1201 on a radio network. The card 1249 also contains a memory for storing a personal telephone number registry, text messages, and user specific mobile terminal settings.

[00168] While the invention has been described in connection with a number of embodiments and implementations, the invention is not so limited but covers various obvious modifications and equivalent arrangements, which fall within the purview of the appended claims. Although features of the invention are expressed in certain combinations among the claims, it is contemplated that these features can be arranged in any combination and order.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A method comprising:

determining one or more characteristic values from audio data, wherein the one or more characteristic values include one or more musical notes and respective durations for the one or more musical notes;

mapping the one or more musical notes to a first visual characteristic of one or more visual elements, wherein the first visual characteristic includes a color, a pattern, a design, or a combination thereof of the one or more visual elements;

mapping the respective durations to a second visual characteristic of the one or more visual elements, wherein the second visual characteristic includes a size of the one or more visual elements; and

generating a visual representation of the audio data by rendering the one or more visual elements using the mapped first visual characteristic and the mapped second visual characteristic.

2. A method of claim 1, further comprising:

processing the audio data to determine an audio sequence of the one or more musical notes; and

determining one or more values for the first visual characteristic based on the audio

sequence.

3. A method of claim 2, further comprising:

selecting at least one color space for the visual representation; and

mapping a color sequence for the one or more values of the first visual characteristic based on the at least one color space.

4. A method of claim 3, wherein the at least one color space is selected from at least one of a Pantone color space, a RAL color space, a DIC color space, an ISO color space, a Chinese building color space, an NCS color space, a Munsell color space, and a Rite Digital color space.

5. A method of claim 3, further comprising:

changing the color sequence for the one or more values of the first visual characteristic by shifting along the at least one color space.

6. A method of claim 2, further comprising:

generating a legend for correlating the one or more values for the first visual characteristic to the audio sequence, the one or more musical notes, or a combination thereof; and presenting the legend in the visual representation.

7. A method of claim 1, further comprising:

specifying the size of the one or more visual elements to be proportional to the respective durations of the one or more musical notes.

8. A method of claim 1, further comprising:

varying the size of the one or more visual elements to be proportional to the respective

durations of the one or more musical notes relative to one axis of the one or more visual elements while maintaining a fixed size along another axis of the one or more visual elements.

9. A method of claim 1, wherein the one or more visual elements is a geometric shape.

10. A method of claim 9, wherein the geometric shape includes a parallelogram.

11. A method of comprising: processing a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation, wherein the first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements, and wherein the second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements;

generating an audio sequence based on the one or more musical notes and the respective durations; and

initiating a playback of the audio sequence.

12. A method of claim 11, further comprising:

determining a mapping of the color, a pattern, a design, or a combination thereof to a tonal range, wherein the mapping was used to generate the visual representation; and extracting the one or more musical notes from the visual representation based on the

mapping.

13. A method of claim 12, further comprising:

determining at least one color space associated with the visual representation,

wherein the extracting of the one or more musical notes from the visual representation is further based on the at least one color space.

14. A method of claim 13, the at least one color space is selected from at least one of a Pantone color space, a RAL color space, a DIC color space, an ISO color space, a Chinese building color space, an NCS color space, a Munsell color space, and a Rite Digital color space.

15. A method of claim 13, further comprising:

processing the visual representation to extract a legend for correlating the at least one color space to the tonal range or to the one or more musical notes,

wherein the mapping of the color space to the tonal range is based on the extracted legend.

16. A method of claim 11, further comprising:

processing the size of the one or more visual elements to determine the respective durations of the one or more musical notes.

17. A method of claim 11, wherein the respective durations of the one or more musical notes is encoded with respect to at least one axis of the one or more visual elements.

18. A method of claim 11, wherein the size of the one or more visual elements is

proportional to the respective durations of the one or more musical notes corresponding to the one or more visual elements.

19. A method of claim 11, wherein the one or more visual elements is a geometric shape.

20. A method of claim 19, wherein the geometric shape includes a parallelogram.

21. An apparatus comprising:

at least one processor; and

at least one memory including computer program code for one or more programs, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following,

determine one or more characteristic values from audio data, wherein the one or more characteristic values include one or more musical notes and respective durations for the one or more musical notes;

map the one or more musical notes to a first visual characteristic of one or more

visual elements, wherein the first visual characteristic includes a color, a pattern, a design, or a combination thereof of the one or more visual elements; map the respective durations to a second visual characteristic of the one or more visual elements, wherein the second visual characteristic includes a size of the one or more visual elements; and generate a visual representation of the audio data by rendering the one or more visual elements using the mapped first visual characteristic and the mapped second visual characteristic.

22. An apparatus of claim 21, wherein the apparatus is further caused to:

process the audio data to determine an audio sequence of the one or more musical notes; and determine one or more values for the first visual characteristic based on the audio sequence.

23. An apparatus of claim 22, wherein the apparatus is further caused to:

select at least one color space for the visual representation; and

map a color sequence for the one or more values of the first visual characteristic based on the at least one color space.

24. An apparatus of claim 23, wherein the at least one color space is selected from at least one of a Pantone color space, a RAL color space, a DIC color space, an ISO color space, a Chinese building color space, an NCS color space, a Munsell color space, and a Rite Digital color space.

25. An apparatus of claim 23, wherein the apparatus is further caused to:

change the color sequence for the one or more values of the first visual characteristic by shifting along the at least one color space.

26. An apparatus of claim 22, wherein the apparatus is further caused to:

generate a legend for correlating the one or more values for the first visual characteristic to the audio sequence, the one or more musical notes, or a combination thereof; and present the legend in the visual representation.

27. An apparatus of claim 21, wherein the apparatus is further caused to: specify the size of the one or more visual elements to be proportional to the respective durations of the one or more musical notes.

28. An apparatus of claim 21, wherein the apparatus is further caused to:

vary the size of the one or more visual elements to be proportional to the respective durations of the one or more musical notes relative to one axis of the one or more visual elements while maintaining a fixed size along another axis of the one or more visual elements.

29. An apparatus of claim 21, wherein the one or more visual elements is a geometric shape.

30. An apparatus of claim 29, wherein the geometric shape includes a parallelogram.

31. An apparatus comprising:

at least one processor; and

process a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation, wherein the first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements, and wherein the second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements;

generate an audio sequence based on the one or more musical notes and the respective durations; and

initiate a playback of the audio sequence.

32. A method of claim 31, wherein the apparatus is further caused to: determine a mapping of the color, a pattern, a design, or a combination thereof to a tonal range, wherein the mapping was used to generate the visual representation; and extract the one or more musical notes from the visual representation based on the mapping.

33. A method of claim 32, wherein the apparatus is further caused to:

determine at least one color space associated with the visual representation,

34. A method of claim 33, the at least one color space is selected from at least one of a Pantone color space, a RAL color space, a DIC color space, an ISO color space, a Chinese building color space, an NCS color space, a Munsell color space, and a Rite Digital color space.

35. A method of claim 33, wherein the apparatus is further caused to:

process the visual representation to extract a legend for correlating the at least one color space to the tonal range or to the one or more musical notes,

36. A method of claim 31, wherein the apparatus is further caused to:

process the size of the one or more visual elements to determine the respective durations of the one or more musical notes.

37. A method of claim 31, wherein the respective durations of the one or more musical notes is encoded with respect to at least one axis of the one or more visual elements.

38. A method of claim 31, wherein the size of the one or more visual elements is

39. A method of claim 31, wherein the one or more visual elements is a geometric shape.

40. A method of claim 39, wherein the geometric shape includes a parallelogram.

41. A computer-readable storage medium carrying one or more sequences of one or more instructions which, when executed by one or more processors, cause an apparatus to at least perform the following steps:

42. A computer-readable storage medium of claim 41, wherein the apparatus is further caused to perform:

sequence.

43. A computer-readable storage medium of claim 41, wherein the apparatus is further caused to perform: specifying the size of the one or more visual elements to be proportional to the respective durations of the one or more musical notes.

44. A computer-readable storage medium of claim 41, wherein the apparatus is further caused to perform:

45. A computer-readable storage medium carrying one or more sequences of one or more instructions which, when executed by one or more processors, cause an apparatus to at least perform the following steps:

processing a visual representation of audio data to determine a first visual characteristic and a second visual characteristic of one or more visual elements in the visual representation, wherein the first visual characteristic encodes one or more musical notes of the audio data using a color, a pattern, a design, or a combination thereof of the one or more visual elements, and wherein the second visual characteristic encodes respective durations of the one or more musical note using a size of the one or more visual elements;

initiating a playback of the audio sequence.

46. A computer-readable storage medium of claim 45, wherein the apparatus is further caused to perform:

mapping.

47. A computer-readable storage medium of claim 45, wherein the apparatus is further caused to perform:

processing the size of the one or more visual elements to determine the respective durations of the one or more musical notes,

wherein the respective durations of the one or more musical notes is encoded with respect to at least one axis of the one or more visual elements.

48. A computer-readable storage medium of claim 45, wherein the size of the one or more visual elements is proportional to the respective durations of the one or more musical notes corresponding to the one or more visual elements.