KR20140092863A

KR20140092863A - Methods, systems, devices and computer program products for managing playback of digital media content

Info

Publication number: KR20140092863A
Application number: KR1020147014581A
Authority: KR
Inventors: 필립 샌트; 도미닉 블래치포드; 닐 하트; 매튜 화이트; 매튜 티게
Original assignee: 옴니폰 리미티드
Priority date: 2011-10-31
Filing date: 2012-10-31
Publication date: 2014-07-24

Abstract

CLAIMS 1. A method of managing playback of one or more items of digital media content, e.g., to ensure a natural transition between items of digital media content, the method comprising: (a) providing a description defining how to manage playback of one or more items of digital media content; (B) identifying the digital media content; and (b) automatically controlling the playback of the digital media content using the description in the digital media player. Related systems, computer program products, digital media players and servers are also provided.

Description

[0001] METHODS, SYSTEMS, DEVICES AND COMPUTER PROGRAM PRODUCTS FOR MANAGING PLAYBACK OF DIGITAL MEDIA CONTENT [0002]

The present invention relates to a method for defining reproduction criteria for one or more items of digital media content, for example, a method for ensuring a natural transition between items. The present invention includes a system, apparatus and computer program product related to the method.

A common historical problem when playing digital media content was deciding how to manage transitions from one piece of content to another.

Conventional solutions include simply sequencing the content with or without an intermediate gap, or perhaps fading down one item and fading up another item ("crossfading"), perhaps overlapping.

However, each of the above methods has inherent problems. That is, simple sequencing can cause audiences to feel a jarring, and cross-fading can sometimes cause an impact loss, such as when a crescendo fades down to fade in the next music track .

The preferred embodiment of the present invention provides a mechanism to smooth transitions from one item to the next as described below, and by managing the presentation and / or playback of one or more items of digital media content, Solves the problems.

A further problem is that there is "dead air" during playback of digital media content, i.e., unintentional or inevitable silence to date. This is a particular problem with the service of streaming digital media content for playback on a client device in which case a particular segment of the content at the point where the end user wants to listen to the content has not yet been downloaded to the device, Problems can cause silence.

This latter problem can cause silent gaps during stuttering during playback or during playback of a track, or at the beginning or end of a track.

In addition, even when simple cross fading is used, a change operation between tracks that exist in the media player for a while and soften the transition from the end of one song to the next song causes a hard stop Quot; silent time "), i.e. by delivering a stunning interruption. When the user presses pause, skips to the next track, skips to a new point in the song, or simply selects a new song, the user instantly hears an unpleasant sound from his listening experience, do. This destroys the effect, the listener's illusion.

The preferred embodiment of the present invention covers the user's playback experience of various aspects, but also by obtaining the time that can be used to carry out the necessary server calls and the time that can be used to deliver a richer visual interface, . Among other things, the "Disc Jockey Markup Language" (DJML) -enabled media player has the ability to automatically compensate for situations where the content is still unavailable or is in a different style than the content previously played, By providing seamless transitions using interstitials (both described below) and intelligent fading, the preferred embodiment of the present invention allows the user to have an overall seamless experience without "silence time".

Describing the effects of an overall seamless and adaptive dynamic music system is difficult because this has not existed before.

A preferred embodiment of the present invention, in some embodiments, may utilize DSP ("Digital Signal Processing "," Digital Signal Processing ") technology to calculate metadata such as the mood or tempo of digital media content.

According to a first aspect of the present invention there is provided a method of managing playback of one or more items of digital media content, e.g., to ensure a natural transition between items of digital media content,

(a) identifying a description defining how to manage playback of one or more items of digital media content, the description comprising descriptive metadata;

(b) automatically controlling playback of the digital media content using the description in the digital media player.

The method may include a method wherein a description of a particular item of digital media content includes metadata identifying an important event or characteristic of the item and then a digital media player uses the metadata to automatically control playback of the item Lt; / RTI >

The method may be a method in which the description of a particular item of digital media content is a timeline description that identifies when the important event of the item occurs in time and the location of the important event.

The method may further comprise: the description metadata for the digital media content file including a start point of actual content in the file; The end point of the actual content in the file; A region of a file constituting a vocal; The tempo of the media content; Mood of media content; The pitch of the media content; "Hook" in the content; Appropriate fade-in and fade-out points; The position of any chorus in the file; The location and type of any beat point in the file; Any overlay location where other content is overlaid on the digital media content during playback; And any other metadata associated with controlling playback of the digital media content file.

The method may be a method wherein descriptive metadata for a digital media content file is identified or manually identified by applying digital signal processing (DSP) techniques to the digital content file or identified by a combination of automatic and manual processing.

The method may be a method that further includes generating a description defining how to manage regeneration, which may be performed automatically or using tools created for that purpose, manually performed, Lt; / RTI >

The method may be further described by a description of how to manage playback, including but not limited to: a starting point of actual content in a file; The end point of the actual content in the file; A region of a file constituting a vocal; The tempo of the media content; Mood of media content; The pitch of the media content; "Hook" in the content; Appropriate fade-in and fade-out points; The position of any chorus in the file; The location and type of any bit point in the file; Any overlay location where other content is overlaid on the digital media content during playback; And any other metadata associated with controlling playback of the digital media content file, as well as one or more representations of the description metadata for the digital media content.

The method includes the steps < RTI ID = 0.0 > (i) < / RTI > Or (ii) the best recognizable portion of the track; Or (iii) the "best" part of the track as specified; Or (iv) associated with one or more portions of another track, including but not limited to, portions of a track that are similar to portions of another track, such as a track starting in a similar manner; Or (v) recalling the specified track; Or (vi) one or more extracted sections of a track of audio and / or video content identified as a combination of one or more of the listed criteria.

The method may be a method in which a "hook" is identified using one or more of a digital signal processing ("DSP") technique, passive or any other method.

The method may also be applied to one or more of at least one of a crossfading, a parallel placement, or any other technique for joining digital media content, wherein the "hook" includes one or more hooks from one or more tracks Lt; RTI ID = 0.0 > hooks. &Lt; / RTI >

The method further includes the steps of managing the playback, the information regarding one or more recommendations of a requirement on how the digital media content file is cached on the client device; "Fallback" digital media content played in place of the digital media content file when the file is disabled for some reason; A recommendation or requirement of what digital media content should be played after the digital media content file; How to apply some audio and / or video processing, what initial volume to use for playback, how to apply normalization of tracks, or any other playback criterion; Overlaying one track on another track selectively or otherwise, such as defining a commentary track of audio, video, or text for demonstration with the currently playing track; Managing playback, including information regarding how to control the tempo and / or pitch of digital content during playback; Any other type of sound used during playback, such as one or more of effects, equalization, volume normalization, compression, or any other audio and / or video processing; Managing a demonstration of digital media content for an end user in a user interface of a client; And any other metadata associated with controlling playback of the digital media content file.

The method may further comprise the steps of: managing the playback; starting and ending a transition in the first file; The start and end time of the transition "end point" in the second file; What transition effect or combination of transition effects to use; A duration for applying any of the transition effects; What gap (if any) to use when transitioning from the first digital media content to the second digital media content; And any other metadata useful for defining transitions between digital content files, as well as techniques for managing transitions between two or more items of digital media content. .

The method may also be used to determine whether the transition effect is linear, s-curve or parametric fading, fade-to-hold, fade-to-transition, slow fade, cross fade, , The timing of the effect, the duration of the effect, or any other information related to the application of a given transition effect.

The method may include generating an indication of an item of digital media content using the description metadata identified for the content to generate an explanation in some standardized format such as XML, JSON, or any other applicable format May be a method performed by a software application.

The method may include a method of describing how to manage playback, describing a sequence of one or more items of digital media content, specifying any effects to apply during playback, and managing transitions between each item of digital media content Lt; / RTI >

The method may be performed either from a list of digital media content files manually or automatically provided, or from an abstract of the digital media content file, such that the description occurs in any standardized format such as XML, JSON, or any other applicable format, Or may be a method generated using a software application that generates the above description.

The method may be a method that includes a description that defines how the digital media content file itself manages the playback of one or more items of digital media content.

The method may be a method in which the digital media content file includes one or more digital media files and / or one or more abstracts from two or more digital media files.

The method may be adapted so that the description of how to manage the playback of the digital content is directly related to the purpose of avoiding unintentional silence, or "silence time" and / Or indirectly or by a plug-in to the digital media player.

The method may be a method wherein the digital media content is digital music content or digital video and audio content.

The method may be a method wherein the digital media player is a smartphone or tablet computer.

The method may be a method wherein the description metadata includes a point at the end of the audio separate from the end of the file in that the valid audio content later identifies a portion of the digital media file that has little or no presence in the digital media file .

The method may be such that the description metadata includes the beginning of the audio element of the audio file.

The method includes: defining metadata; Instructions for the cache; Fallback playlist; Streaming playlists; And a link for requesting more playlist items.

The method includes determining which track the description metadata will play; At which point to start playing each track; At which point the playback of each track will end; How to play each track in relation to what audio and / or video processing to apply, such as what initial volume to use for playback, how to apply normalization of tracks, or any other playback criteria; How to cross fade between tracks and how to use any gap (if any) to smooth transitions; and how to transition from and to each track; What track to play after a given track as a set of selection criteria that the client application can use to select from a selection of possible "next tracks " or as a simple track identifier; Handling the case where the "next track" is temporarily or permanently unavailable, such as providing a pre-cached track for alternate use; Managing the demonstration of a track for an end user in a client's user interface; And overlaying one track on another track, selectively or otherwise, such as defining a commentary track of audio, video or text for a demonstration with the track currently being played Lt; RTI ID = 0.0 > information. &Lt; / RTI >

The method may further include the step of playing audio only after responding to a user invoke play operation after opening a session / web player / software app in a digital media player.

The method includes providing a user interface to an end user to assist in researching, browsing and / or navigating digital media content,

(a) analyzing digital media content to create a "hook " on digital media content or to search for" hooks &

(b) replacing or augmenting a graphical or textual representation of the digital media content with said "hook ".

According to a second aspect of the present invention, there is provided a method of analyzing digital content,

(a) identifying a collection of digital media files;

(b) performing a DSP analysis on the collection of digital media files to automatically generate audio start and end points within the file;

(c) generating and storing metadata based on the DSP analysis.

The method may further comprise performing a DSP analysis on the digital media file to automatically identify the tempo and mood of the music in the file.

The method may be used to automatically identify potential overlay points (where audio can be overlaid on a file), or to automatically identify "hooks " And performing DSP analysis on the digital media file to automatically identify the metadata of the digital media file.

According to a third aspect of the present invention there is provided a collection of digital media content files comprising a description related to defining a method for managing reproduction of one or more items of digital media content, Is provided. The collection may include one or more aperture files.

According to a fourth aspect of the present invention there is provided a system including a digital media player and a content server, the digital media player being connectable to the content server via a content transfer network, Wherein the system is operable to provide content delivery to the digital media player in response to a call to the digital media player,

(a) a description defining a method for managing reproduction of one or more items of digital media content, the description comprising descriptive metadata,

(b) a system operable to automatically control playback of digital media content using the description in the digital media player.

The system may be a system operable to identify a description defining how the digital media player manages playback of one or more items of digital media content, the description including descriptive metadata.

Wherein the system identifies that the content server specifies how to manage playback of one or more items of digital media content, the description includes descriptive metadata, and transmits the description to the digital media player Lt; / RTI >

According to a fifth aspect of the present invention, there is provided a system including a digital media player, an identification server and a content server, wherein the digital media player, the identification server and the content server are connectable to each other through a content transmission network, The system being operable to provide content delivery to the digital media player in response to a call from the digital media player to the content server,

(a) a description that specifies how the identification server manages reproduction of one or more items of digital media content, the description comprising descriptive metadata,

(b) the identification server is operable to transmit the description to the digital media player,

(c) the digital media player is operable to automatically control playback of the digital media content using the description.

The system according to the fourth or fifth aspect of the present invention may be a system operable to implement the method of the first or second aspect of the present invention.

According to a sixth aspect of the present invention, there is provided a digital media player forming part of a system according to the fourth or fifth aspect of the present invention.

According to a seventh aspect of the present invention, there is provided a content server forming part of a system according to the fourth or fifth aspect of the present invention.

According to an eighth aspect of the present invention, there is provided an identification server forming part of a system according to the fourth or fifth aspect of the present invention.

According to a ninth aspect of the present invention, there is provided a computer program product operable to perform a method of managing playback of one or more items of digital media content, e.g., to ensure a natural transition between items of digital media content,

(b) automatically controlling the playback of the digital media content using the description in the digital media player.

The computer program product is operable to implement the method of the first or second aspect of the present invention.

A preferred embodiment of the present invention provides a method and system for providing a client application with a "time " of one or more items of digital media content to assist in analyzing, navigating, examining or directing (" play " Quot; line ".

At its core, the preferred embodiment of the present invention requires:

1. Describe the metadata of the digital media content file, such as the start and end points of the actual content in the digital media content file, tempo, mood, "hooks" in the content, by digital signal processing (DSP) .

2. Using the metadata defining how to play one or more items of digital media content, to generate not only the items of the digital content itself, but also a description that controls how the items are transited and superimposed.

3. Using the above description in a digital media player to provide a seamless content playback experience.

The markup language described herein is merely an illustrative embodiment, and any suitable language with equivalent or appropriate semantics may be used to demonstrate embodiments of the present invention.

FIG. 1 is a diagram showing a domino diagramatically illustrating gap matching. FIG.
2 is a diagram showing cross fading.
3 is a diagram showing an example of a low-speed fade (10 seconds).
4 is a diagram showing an example in which the low speed fade is switched to the high speed X-fade (right side).
5 is a diagram showing an example of high-speed X-fade.
6 is a diagram showing an example of pausing and playing x-fade.
Figure 7 is an illustration of an example of exploration within a single audio stream.
Figure 8 shows an example of an Extensible Markup Language (XML) that implements part of the present invention, followed by Figure 9.
Fig. 9 shows an example of XML markup embodying a part of the present invention, which is a sequential view from Fig. 8, and the codes of Figs. 8 and 9 form a single code portion.
Fig. 10 is a waveform display of audio indicative of an identified hook. Fig.
11 is a system including a digital media player, a content transfer network, and a content server, wherein the digital media player is connectable to the content server via the content transfer network and the content server is connected to the content server The system being operable to provide content delivery to the digital media player in response to a call to the digital media player,
(a) a description defining a method for managing reproduction of one or more items of digital media content, the description comprising descriptive metadata, and
(b) an example of a system operable to automatically control playback of digital media content using a description in a digital media player.
12 is a system including a digital media player, a content transmission network, an identification server and a content server, wherein the digital media player, the identification server and the content server are connectable to each other through the content transmission network, The system being operable to provide content delivery to the digital media player in response to a call from the digital media player to the content server,
(a) a description that specifies how the identification server manages reproduction of one or more items of digital media content, the description comprising descriptive metadata,
(b) the identification server is operable to transmit the description to the digital media player,
(c) the digital media player is operable to automatically control playback of digital media content using the above description.

Justice

For the sake of convenience and to avoid unnecessary repetition, the terms "music" and "media content" in this specification are intended to include all "media content" that can be converted to digital or digital form, And all other media, including videos, videos, television shows (series, seasons or personal episodes), computer games and other interactive media, images (photos or otherwise) Content ".

Similarly, the term "track" represents a particular item of media content that may be a song, a television show, an e-book or a portion thereof, a computer game or any other separate item of media content.

The terms "playlist "," timeline ", and "album" are intended to refer to a collection of "tracks " and / It is used interchangeably.

The "timeline" may also refer to any time-indexed data; DJML is an example of a time color doll item, specifically metadata.

The terms "digital media catalog", "digital music catalog", "media catalog" and "catalog" are used interchangeably to indicate a collection of tracks and / or albums that a user can access for listening purposes. The digital media catalog may aggregate both digital media files and their associated metadata, or, in another exemplary embodiment, digital media and metadata may be delivered from a plurality of the catalogs. It is not intended that only one such catalog exists, and the term encompasses concurrent, concurrent, or collective concurrent access to a plurality of different catalogs. The actual catalog utilized in any given operation may be fixed or may vary depending on time and / or the location or access rights of a particular device or end user.

The abbreviation "DRM" is used to refer to a "digital rights management" system or mechanism used to grant access to digital media files.

The verbs "Listen "," Play ", and "Play" include listening to audio content, viewing video or image content, reading books or other textual content, Is used to encompass any interaction between human and media content, which may include interacting with interactive media content, analyzing media content, navigating or examining, or any combination of the above behaviors .

The terms "user "," consumer ", "end user ", and" individual "are used interchangeably to refer to a group of people or people using facilities provided by an interface. In all cases, the expression male can include women, and vice versa.

The term "device" and "media player" are intended to encompass all types of devices, including but not limited to MP3 players, television sets, home entertainment systems, home computer systems, mobile computing devices, game consoles, portable game consoles, IVE or other vehicle media players, Quot; is used interchangeably to refer to any computing device capable of playing digital media content, including any other applicable device or software media player. Some are essentially capable of media playback.

The term "DSP" (Digital Signal Processing) refers to any computational processing of digital media content for extracting additional metadata from digital media content. Such computed metadata may take a variety of forms, including deriving the tempo of a music track or identifying one or more spots within a digital media file that are evaluated as displaying the content as a whole.

The term "hook " is used to refer to one or more portions of a digital media file identified by the DSP, either manually or in any other way, as an overall representation of the content. For example, a movie trailer consists of a series of one or more "hooks " from a movie while a particularly suitable riff or line from a music track is used for similar identification purposes.

The terms "UX" and "user experience" are used interchangeably to cite experiences of end users when interacting with particular embodiments of the present invention.

The term "X-fade" is used as an abbreviation for "cross-fade ", which fades down a track being played back and then fades up the next track at a predetermined point in the transition Playback operation. The exact mechanism and the timing involved in fading down and fading up the track may vary between different X-fade techniques as described in detail below.

The term "JSON" refers to "JavaScript Object Notation ", which is a standard industry format used to describe data and metadata.

The terms "DJML" and "disc jockey markup language" are intended to cover, by way of non-limitative example, any of the exemplary embodiments of the invention, including the principal embodiments of the invention, Quot; is used interchangeably throughout this specification to refer to a digital media player that is constructed to implement features.

Description - Preface

When additional functions are paired with this concept, the system constitutes a very powerful music system that allows the user to hear the best, most interesting and recognizable part of the song, thereby allowing the user to play from the beginning or next You can skip it. This can turn the basic navigation (if applied or switched) into a fast decision process without any user disappointment, and this becomes an interactive X-fade detection method similar to a professional, highly computed media experience, such as an advertising or radio station.

The preferred embodiment of the present invention describes in its most general form how to define timelines of tracks for playback and how those tracks are played and transited between tracks.

One exemplary implementation of the invention is essentially defining what the radio station is as a series of tracks, gaps, DJ commentary, advertisements or any other item and defining a transition method between each track.

As an example, when a radio station is sequentially oriented to reproduce the experience of a radio station, it is referred to as "disc jockey " by a suitable client device that simply implements the markup's directionality to retrieve transitions between identified tracks and tracks. Markup language "will be defined only in terms of DJML.

Other exemplary implementations of the invention enable the use of tools to mix tracks using defined cross fading or other transition techniques, and the output of the tool is played using any DJML capable client application or device It is a DJML file.

In a preferred embodiment, the present invention provides a method for specifying a playlist of audio and video elements, including rich metadata that controls how the tracks transit and overlap, as well as the track and video itself.

DJML is intended to provide experience that surpasses traditional broadcast or traditional broadcasts.

Identifying description metadata

The cornerstone of DJML is the definition of the index point in the known content. For exemplary marking:

The beginning of the audio element in the audio file

ㆍ Voice part of track

Ideal in-fade and out-fade points

Multiple chorus / hook points

A point of no audio or quiet audio

ㆍ Branches of interest

Beat position

A point at the end of the audio separate from the end of the file in that the effective audio content after that specifies a portion of the digital media file that has little or no presence in the digital media file.

Any other descriptive metadata associated with playback in a DJML capable player

The description metadata described herein can be generated automatically by applying digital signal processing (DSP) technology to the digital media content file. In another exemplary embodiment, the metadata is generated manually. In a preferred embodiment, the metadata is generated automatically in the first case and then manually incremented or adjusted using a tool developed for that purpose.

Once known, the points enable, in the preferred embodiment, automatic generation of DJML markup for tracks of a given order.

In addition, in one exemplary embodiment, the DJML can be manually manipulated to produce a specific mix as produced by the disc jockey. This can take the form of a playlist or slide show with DJML that enables easy configuration of the music or video experience.

Displaying Playback Metadata

The DJML is represented as an XML language markup in a preferred embodiment. In other exemplary embodiments, any other semantically equivalent form of markup, such as a JSON or binary data stream, may be used. The DJML may also be displayed as a series of extensions to an existing playlist language, such as Synchronized Multimedia Integration Language (SMIL) v3, in other exemplary embodiments.

For clarity, the following example is represented in XML format used by the preferred embodiment of DJML. However, the constitution is not limited to the above expression, since it can easily be expressed in another language when necessary.

An example of a fully standardized XML representation of DJML is provided below.

This shows the basic structure as follows.

1. General Definitions

2. Commands for caching

3. Fallback Playlist

4. Streaming Playlist

5. Link to request more playlist items

Of particular importance is the link to request item 5, i.e., more playlist items. A DJML playlist may contain only a single track and a link. The link is used by the client to request the next track in the case of a dynamically generated playlist. The link will return the effective DJML considered in the context of the existing DJML data.

An example of XML markup that implements part of the present invention is shown in Figs. 8 and 9. Fig.

Actual markup terms such as tag name, attribute name, available attribute, and implementation language may vary from embodiment to embodiment as required or required for a given implementation of the invention.

In a preferred embodiment, a portion of the base metadata included in the DJML markup is automatically generated based on the DSP ("digital signal processing") of the digital media file. In another exemplary embodiment, the metadata is manually generated and / or fine tuned. Examples of metadata that are automatically generated in the preferred embodiment include the beginning and ending points of audio in the file, the tempo and mood of the music in the file, the initial identification of the potential overlay point (where audio can be overlaid on the file) "And any additional metadata that can be derived automatically from automated analysis of the digital media file.

Defining Playback

The markup language described herein allows a client application to be informed of one or more of the following, for example:

Which track you play.

• At which point playback of each track starts.

• At which point the playback of each track ends.

• How to play each track in relation to what audio and / or video processing is applied, such as the initial volume used for playback, how to apply normalization of tracks, or any other playback criterion.

How to cross fade between tracks, and how to use any gap (if any) to smooth transitions.

• Which track to play after a given track, given as a simple track identifier or given as a set of selection criteria that the client application can use to select from a selection of possible "next tracks".

• Handling cases where the "next track" is temporarily or permanently unavailable, such as providing a pre-cached track for use as an alternative.

How to manage the demonstration of the track for the end user in the client's user interface.

• Overlaying one track on another, selectively or otherwise, such as defining a commentary track of audio, video, or text for demonstration with the track currently playing.

Any other relevant criteria.

A preferred embodiment of the present invention may enable the use of customized gaps in a design designed to facilitate transition from one item of the digital media content to the next, or in a preferred embodiment.

Such a gap may be branding, advertising, or simply a transition element, and may be based on manual analysis or DSP analysis of the starting point (the media item that is the origin of the transition) and the ending point It is constructed or selected according to the actual or suggested gap element.

Audio elements controlled by DJML may include, by way of non-limiting example, the following.

ㆍ Track

ㆍ Audio Audition

ㆍ Clearance

ㆍ Talk over / Overlay / Tutorial / Help / Notification

ㆍ Transition

ㆍ Advertising

ㆍ Multimedia element blending

• Beat and key matching

Rich metadata (genre, tempo, other related links) connected to unique media

Spectrum audio data (mood, energy, etc. can be identified)

ㆍ Demonstration and user playlist / show reel / slide show editing

Synchronization between clients based on DJML metadata

DJML is intended to provide a client application with all the data needed to implement a modern and future digital broadcast quality experience by implementing a series of simple configurations. The DJML is based on the principle of running timelines with various content related events such as starting, ending, or changing the playback of certain items or items in the timeline.

DJML can be used to control the playback of any number of audio elements that can be superimposed. In a preferred embodiment, each element can be controlled in the following manner.

Start time / end within the content itself, such as "start the play for 8 seconds in the content ".

Relative position (overlap) based on the start or end of content play, such as "play for 10 seconds before finishing current track ".

Fade / cross fade start and end points, including contextual fade strategies.

The type of parameter and fade (e.g., linear, s-curve, or parametric fading) required to define the fade, such as the duration of the data point on the curve.

ㆍ Tempo / Pitch / Time

Control of general tempo and pitch adjustment of content

ㆍ Different types of sound processing, including effects.

Equalization, volume normalization, compression and other audio and video processing.

• Contextual user interface audio handling based on data provided in the DJML timeline and unique media data provided internally.

The exemplary embodiments may implement one or more or all of the techniques described above.

In a preferred embodiment, the DJML allows the client application to manage the overall audio experience for the user. This involves determining when to cross-fade a piece of media to transit to the next element. For example, pausing a track, switching tracks, checking an artist's collection of works, and / or switching to a new channel.

Appendix B describes various use cases that are used in the preferred embodiment of the present invention, wherein the DJML is used to control the operation of the digital media player to provide a seamless experience for the end user. The given exemplary embodiment need not implement each single use case as described.

Controlling caching

In a preferred embodiment, the present invention not only allows defining which elements of the audio / channel / playlist experience to be pre-cached in what order and with certain priorities, but also allows control over the live streaming aspect of playback .

The control includes the following.

How many content items are downloaded immediately.

What is the order.

The duration that the client should cache each item.

For example, some gaps, such as jingle or normal talk over for the channel, are preloaded and stored by the client application for later use at a prescribed point in the timeline.

Avoidance of Silent Time

In a historical media player there is a delay, or silence gap, when playing a song after a song, skipping, or simply selecting a track to play. This " delay " or " silence " is caused by a number of factors, and their sum leads to a large perceived gap in audio, i.e. a long pause in silence or "silence time ".

The main reason for this is, without any particular order,

• Delay in requesting and receiving URLs for the next track, especially on streaming media players.

ㆍ The rate at which the player is buffering and switching to the ready state, ie the time that the new buffer is filled and the player is ready to play.

• Audio mute of each file at the beginning and end of the song. This can be very long at the end of some media.

Fade out natural in some short music. This can give a psychoacoustic perception that is longer than the actual audio gap depending on the real-world volume.

In some UI implementations, a timeout delay is added to ensure that a skipping user does not trigger multiple requests for media that he does not intend to play. A typical value for this delay is 400ms when skipping through the player (not when selecting a new music source).

In a preferred embodiment, the DJML may cause the specification of a set of audio elements and metadata that can be used in an emergency situation to fill the silence period if the client can not stream the next required audio fragment.

Such components, in the preferred embodiment, are pre-cached forward using caching rules, thereby ensuring the availability of playback when needed.

Overlay And location of disturbances

In a preferred embodiment, the DJML can define (e.g., notify or DJ talk over) which time portion of the timeline or particular item during playback is suitable for audio overlay. This is to avoid dynamic overlays from talking about important parts of the track such as vocals or choruses. In some embodiments, a priority scheme may be used to further purify the definition.

Clearance - What is the gap?

A gap is typically a short piece of digital media content designed to integrate between two different items of digital media content. In connection with the preferred embodiment of the present invention, the primary use of the gap is as follows in various exemplary embodiments.

To facilitate transition between playback of two or more fragments of digital media content, as described below.

For example, to prevent silence during playback ("silence time") by providing fragments of "fallback" content for play when the requested content is still unavailable.

For the sake of simplicity, the gap is here provided in connection with audio clips for insertion between two different audio clips. However, in another exemplary embodiment of the present invention, techniques similar or identical to those described below may be used to produce a video or audiovisual gap-for example, for a movie, television show or computer game-or any other suitable digital media content It can also be used.

The auditory, audiovisual and / or visual gaps are, in a preferred embodiment, a plurality of identified "hooks " in accordance with the PCT patent " Hooks "for use with those disclosed in Application Serial No. PCT / GB2012 / 052634, which is incorporated herein by reference. Selected content from PCT Patent Application No. PCT / GB2012 / 052634 is provided in Appendix C. For example, in the case of a movie or television show, a plurality of hooks combined using a video gap may form a form of auto-generated advertisement trailer for that content.

Type of gap

In a preferred embodiment, the gap may be one or more of the following types.

· Branding like Station Idents

ㆍ Advertising

Transitional elements such as auditory / music sequences to aid in cross fading between two music tracks.

The transition element can be pre-built or customized as described below.

In either case, each gap should be labeled to indicate what kind of transition (within a specified criterion) the gap is associated with (rhythm, mood, tempo, and / or any other By processing through a suitable DSP algorithm to determine metadata).

For example, a given gap may be labeled as being suitable for cross fading from fast "rap" music (starting point) to slow piano tune (destination point) It can be labeled.

Note that in a preferred embodiment, the labeling of the gap may also include additional metadata, such as how the gap should be introduced into the main reproduction stream. For example, the gap may be suitable for cross fading (where the gap is designed to fade out when the outgoing track fades out) with or without silence gaps for simple sequencing inserts (without fading) Can be attached.

In a preferred embodiment, the definition of how the gap can be used is specified with respect to its introduction and its coda according to its appropriate starting point, its appropriate destination point and its possible reproduction mode. In a preferred embodiment, the metadata is marked up using a suitable markup language as described herein.

In another exemplary embodiment, the gap is labeled with only a combination of one or more metadata elements described above in connection with the preferred embodiment.

Selection of gap

If any type of gap is pre-built, the main problem is to select the appropriate gap clip to use.

The initial data used to determine which gap to use is based on manual processing or DSP processing such as:

"Starting point" (which may be the end of the track), which is the starting point of the transition, such as the current position of the currently playing track,

• "Hook" or "end", which is the destination of the transition, such as the beginning of the next track to play.

The problem with the selection of the required type of gap to soften the transition between the two positions by the two positions processed through the appropriate DSP algorithms to determine the rhythm, mood, tempo and / or any other relevant metadata Is reduced.

For example, if the "beginning" has a light tempo and the "end" has a slow Waltz bit, then the selected gap should be an audio clip designed (or configured) to smooth transitions between the two special types of audio.

In a preferred embodiment, there is an existing gap identified and labeled as suitable for any given transition that can be computed from a digital media catalog that allows the preferred embodiment of the present invention to operate.

If there are a very large number of possible combinations of start and end points, it may not be practical to build all possible gap elements, and then a custom interstitial should be constructed as described below.

Configuring Custom Clearance

In a preferred embodiment, the fit gap is defined by a series of prebuilt < RTI ID = 0.0 > prefabricated < / RTI > assemblies by matching the beginning of the first gap to the & And the gaps are sequenced.

If there is no single gap corresponding to both the start and end points, in a preferred embodiment, additional gaps may be selected to complete the transition sequence.

The basic approach is that it has as many dominoes as it has, and each domino's head and tail must match what is on either side. Figure 1 illustrates this approach, where the "start" point is "2" and the end point is "3", but the pre-built gap does not directly transition from "2" to "3." Thus, the intermediate gap "domino" is used to soften the transition from 2 to 5 to 6 to 3 (in this example). If a shorter sequence is found (for example, 2 → 4 → 3 in this example), in a preferred embodiment, the shorter sequence will be used instead.

This custom clearance, in the preferred embodiment, is constructed hastily when needed and uses the same playback rules as described above for the simple pre-build clearance. In the preferred embodiment, the custom clearance, once constructed, is treated in exactly the same way as the prebuilt clearance.

cross fading

Cross fading between tracks and / or gaps may be performed by lowering the volume of one track / gap while simultaneously raising the volume of the track / gap being faded.

In a preferred embodiment, the invention allows such crossfading to be defined in relation to the following.

ㆍ At which point of the currently playing track the cross fading process starts

• At which point of the "next track" the cross fading process is cross-fading

ㆍ What techniques are used for cross fading

Thus, a cross fade is defined as transitioning from one point of a first playitem to another point of a second playitem using a particular transition technique (or a set thereof).

The transition technique to be used is defined in the preferred embodiment as the duration for which a given effect is applied (when applicable and preferred in a given embodiment) and the effect to be applied. The effect to be applied may be one or more of linear, algebraic, sine, cosine, s-curve, exponential or any other audio cross-fading technique. Similarly, in one exemplary embodiment, a video crossfading technique such as wipe, bleaching, fade-to-black or any other video crossfading technique is applicable Can also be defined.

In a preferred embodiment, the duration of the effect and / or the effect to be applied is defined for both the track on which the fade is initiated and the track to be fade.

Annex A provides a definition of (but not all) possible cross-fading effects that are used in the preferred embodiment of the present invention.

Crossing of streaming media fading

When the digital media content is streamed through a network, such as the Internet or any other network, in the preferred embodiment, the same cross fading and gap usage selection rules as described above are used to determine whether the " To be able to identify a suitable gap by the client application, or enough that the track and / or gap can be properly mixed by the client application).

If such pre-buffering is not possible, in a preferred embodiment a suitably pre-identified gap may be used to insert into the reproduction to avoid unwanted gaps or silence during reproduction.

Crossing Offline Files fading

The present invention, in a preferred embodiment, can manage the playback of a track regardless of whether the tracks are present on the client device, streamed from another device or remote server, or downloaded from a remote server or other device .

If the track is present in the client device, the preferred embodiment of the present invention can manage playback of those tracks as needed, whether the device is online or offline.

A typical application of the preferred embodiment of the present invention

The invention aids in the following in various exemplary embodiments including its preferred embodiments.

ㆍ Listen to the broadcast quality channel in the subscription music service.

DJML is used here to crossfade between tracks and to provide the metadata needed by the client to insert jingle, talk over and gap at the appropriate point in the same way as traditional radio programmers do.

• Previewing the configured audio content of the "hooks" of each track that are smoothly cross fading together.

DJML is used here to specify the metadata around the playlist of the track so that each track starts playing at its hook time index for 8 seconds while soft crossfading between each track.

Provide a DJ-style mix of audio content, such as having a special mix of live content or a simulated live event.

The DJML is used to encode a custom cross-fade parameter between the start index and duration of each track and each piece of audio.

User-oriented UI for calculating a mix of content using a DJML configuration, such as a user sharing his own track mix with friends.

DJML is manipulated by the user through a graphical interface that allows the audio component to be selected with overlay, effects, and transitions between its components.

• A more natural transition between playlist tracks.

Pressing "Next" in the audio segment in the DJML enabled digital media player lowers the audio level using the fade, the fade time can be the default, or the fade time can be driven from the DJML data. Then the next track is started at the appropriate time, either by default or also based on the DJML data.

This example ensures that the next media fragment is played at the time driven by the skipping of the skipping media. Then, a new media fragment is played from the beginning (without fading). Timing exactly when this track starts is driven by DJML data. For example, if a new media has a strong clear noisy beginning, DJML knows exactly where the audio starts in the file (ie, 500ms from the beginning of the file), which is very unique in the auditory and visual realm Provides a controlled user experience.

Smoother social networks and messaging interactions

If an end user receives a notification (e. G., A new message, a friend login, a status or a system message, etc.) when playing some digital media content, the DJML lowers the main media audio and interrupts audio The application can be used to specify the section of the timeline in which the overlay can occur, including the time index for delivering and then fading in to the media being played back at the time that matches the music, Can be delayed.

Client applications designed to allow users to create their own unique playlist / mixed tape / DJ style 'set' will use DJML to automatically position music tracks along the event timeline. The user edits this, and DJML has index points and fade data that allows for easy snapping of events.

The automatically generated playlist can be arranged based on the "best match" element indicated by DJML.

The user's musical library can be shuffled to produce the best arrangement and order of music or media.

The metadata associated with the DJML and each audio element may, in some exemplary embodiments, be based on metadata encoded with DJML markup combined with additional metadata, such as specific user preferences or settings, It will be used to produce a pleasant "mix" of tempo / style.

Cue points are inserted manually and / or automatically, and then cross fades are defined to produce encoded DJ style effects, tempo and bit metadata using the preferred embodiment of the present invention. So that any kind of EDM (Electronic Dance Music) type music can be mixed.

Select the next track

As similar processing is used to select which gap to provide between the two known tracks, the above-described processing and / or manual marking is used to select which track to follow next to the currently playing track A part thereof).

The selection of the next track to play may, in the preferred embodiment, be determined based on one or more of the following criteria.

&Lt; RTI ID = 0.0 > < / RTI > Music recommended by a recommended engine for use in services in which the preferred embodiment of the present invention is utilized;

With speed selection / tempo / genre / age of the current and potential next track determined manually and / or through DSP processing, or gap selection that can help smooth transitions from the current track to the following track Any other criteria that is disclosed to be used;

Manual selection of the next track by the end user;

Any other relevant criteria.

Additional applications

The preferred embodiments of the present invention provide some additional applications that are briefly mentioned in the above description. In a preferred embodiment, the invention allows the track and timeline to be marked up to incorporate zero, one or more of the following metadata.

One or more commentary tracks (optionally) that provide comments to the end user about the track currently being played.

Text to display at a particular time during playback, possibly as a production note, trivia or comment.

ㆍ Karaoke lyrics and timing.

Video for video content, such as a movie or television show or series, in the form of a DJML markup that indicates what portion of the source video is to be reproduced using a transition technique and, if necessary, defines an overlay commentary for the trailer And / or definition of an audio trailer. A tool similar to that described above can be used to generate the DJML definition of the trailer.

Definition of alternative tracks for playback based on access or rights issues. For example, if a radio station is defined as DJML as described above but the user normally plays a track that is unavailable at a location where the radio station listens, an alternative track may be marked up for playback by such user Can be specified.

The method of marking up one or more of the tracks described above is not limited to one unified track, and in one exemplary embodiment, to define a method for mixing individual channels to form a coherent track, Can be used. For example, a mixing desk or equivalent device or application may be used to define how the individual channels of music, effects, and vocals mix to produce a predetermined song, including applicable transition effects and special effects. The output of such a mixing desk would be, in this exemplary embodiment, a piece of DJML markup that defines the song in relation to its constituent parts. In another exemplary embodiment, various "remixes" of the track may be defined as alternative DJML definitions based on the core channel sound.

In one exemplary embodiment, the DJML capable processing is embedded in the firmware / hardware of the device such that low level crossfading is DJML controlled. In a further exemplary embodiment, the embedding may be performed by a mobile handset or portable consumer electronics device such that cross fading may occur without causing excessive battery usage, which is caused in part by the standardized nature of the markup disclosed by the preferred embodiment of the present invention. Lt; / RTI >

The present invention relates in one exemplary embodiment to a method and apparatus for generating a portion of a fuller set, such as music, on a segue (for example, moving from one song, melody, or scene to another uninterruptedly) Lt; / RTI > may be used to identify an item of digital media that forms a digital media. The DJML permits such exceptional cases in the preferred embodiment and can be used to define seamless transitions between each case without the need for cross fading.

The present invention, in a further exemplary embodiment, enables the identification of a music piece with a hard start and end. When played with the situation ignored, the DJML markup disclosed by the preferred embodiment of the present invention will treat the fragment separately and will instruct the client to apply a fade to both ends in one preferred embodiment.

System mode

A system including a digital media player, a content transfer network, and a content server, the digital media player being connectable to the content server via the content transfer network, the content server being operable to perform a call from the digital media player to the content server The system being operable in response to providing content delivery to the digital media player,

(b) a system operable to automatically control playback of digital media content using the description in the digital media player. For example, refer to FIG.

The system may be a system operable to identify a description defining how the digital media player manages playback of one or more items of digital media content, the description including descriptive metadata. Wherein the system identifies that the content server specifies a method for managing playback of one or more items of digital media content, the description comprising descriptive metadata, and transmits the description to the digital media player Lt; / RTI >

A system comprising a digital media player, a content delivery network, an identification server and a content server, wherein the digital media player, the identification server and the content server are connectable to each other via a content transmission network, The system being operable to provide content delivery to the digital media player in response to a call to the content server,

(c) the digital media player is operable to automatically control playback of the digital media content using the description. For example, refer to FIG.

The content delivery network may be a wired network, a wireless network (e.g., a mobile phone network), or may include wired and wireless components. The digital media player may be a mobile phone, a smart phone, a tablet computer, a desktop computer, a laptop computer, a dedicated digital media player, or a computer game machine. The network may be the Internet or a mobile phone network. The digital media player can be portable. The digital media player may include a touch screen. The digital media player may include a GPS location system. The system may include a plurality of digital media players.

week( Note )

It should be understood that the above-described arrangements are merely illustrative of the principles of the present invention. Many modifications and alternative arrangements may be devised without departing from the spirit and scope of the invention. Although the present invention has been shown and described in detail in the drawings, along with specificity and detail in connection with what is presently considered to be the most practical and preferred embodiment of the invention, it will be apparent to those skilled in the art from this inventive principle and concepts It is clear that many modifications can be made without departing from the present invention.

Appendix A: Adaptive X- Fade Technology

The following solution aids in providing improved audio delivery on the client and platform in which we can control the elements of the playback engine.

The solution provided considers two logic elements.

1. Prefetch the other tracks.

2. An automatic X-fade solution that covers all scenarios, including solutions that do not require pre-buffers.

The two solutions require that two or more audio players be available.

The prefetch solution works by requesting and buffering fragments of media ready to play. This means that the media is ready to play when needed. However, we can not fetch each potential, and the solution simply applies to the media in the play queue. Therefore, this is not a complete solution for all use cases, since the user can choose a new play source that can not be predicted successfully.

The x-fade logic acts to cover these other use cases and obfuscates any perceived delay by balancing the play experience. In fact, a good music user experience can be achieved simply by deploying an x-fade solution.

The advantage of pre-fetching greatly improves our experience because we have enough control over the timing of the user experience.

Audio X- Fade solution

The X-fade solution described here in detail depends on the following:

Ability to play multiple audio players and two or more players at any given time

ㆍ Time-controlled volume control

Variable volume control path shapes, ie S-curve, linear, etc.

Ability to dynamically change fade time and shape

There are solutions to the use case below and also include an intestinal solution that enables notification or error conditions by providing audio feedback.

• Pause the play pause

Play sequentially until the next track (from end to beginning)

ㆍ Skip to next track

ㆍ Skip back to the last track

ㆍ Skip back to beginning of track

• Jump to the position of the track being played

Play from any other selection point (not sequential)

UX, we have the following proposed solution, and the preferred solution is Solution 2.

1. A system that fades down to a predefined point, such as 50% or 25% of the original volume, and waits until the requested media is available. If media is available, a fast 2 second fade with a one second overlay is performed. This is known as 'fade-to-hold'.

2. Fade to silence, but the media is expected to be ready within the set time (9-10 seconds), and a fast two second fade with 1 second overlap when media is ready. This is known as a 'fade-to-transition'.

The timing and value given herein should be configurable to allow the service to tune to its requirements or user requirements. The timing and values proposed here are the first evaluations of how to best tune the system and should be tested.

Fade Two Hold

When switching between Player 1 (primary) and Player 2 (secondary), a volume fade to 0 (zero) must be performed. The default default rules are:

Start fading of user actions

Fade to 25% for 2000ms

Fade from 25% until next track player is ready to play

Continue fading to 0% for the next 2000ms

The next track (pre-buffered player 2) must be triggered (now the second is primary) 500ms before the primary player ends its fade and the stop command is issued against the primary.

Fade form Transition

This system is a good approach. The system provides a consistent sense of execution and treats exceptions in a better and more consistent manner.

There are essentially two types of volume controls that are executed.

1. High speed X-fade

Low-speed fade with X-fade

Fast X-Fade is a two second (2000ms) linear volume adjustment from 100% player volume with 0 seconds to 1000ms of X-fade / overlap for the next player. The effect is a perceived transition of one second.

A slow fade is a 10 second (10,000 ms) linear volume adjustment from 100% player volume to 0%. If the next player's media is ready at any point in the process, the system switches to a fast X-fade (two seconds, one second overlay) at that exact point. The effect here is that the volume control is executed as a user action, initially the volume control starts a slow fade and then adjusts the fade to a fast, smooth X-fade transition. This is like having a DJ fade-down while listening to the ready state of the next song, and then only X-fades at the right moment.

Prefetch Ready New media to be played through the player must begin with a full volume.

This function will later become adaptive when we have a flag on the group of data with an instantaneous intermediate waveform start (ie, a track in the middle of the album or live data). This ensures that the data has a smooth transition to the ignored state (this transition should be a 2 second (2000 ms) volume control fade from 0% to 100%).

In a preferred embodiment, this is configurable so that we can try a short general / global volume control fade in of short duration if it is suitable for tuning of the system.

If pre-fetch player = ready (fast X-fade)

Set the volume transition time of the master player to a 2000 ms linear curve path

The start volume transition is reduced in the master player

After 1000ms, start / play pre-fetched ready player (full volume)

The primary completes the volume transition to zero

Otherwise (slow fade) - when there is no media player ready to play (pre-fetch)

Set the master player's volume transition time to a 10,000 ms linear curve path

When ready to switch to volume transition time of pre-fetch player = 2000 ms (fast X-fade)

There are additional volume control rules for pause / play

If the user pauses the play of the media fragment (master player), the player must operate the volume control from 100% to 0% by S-curve fade shape for a time of 500ms.

If the user plays a paused piece of media (master player), the player must perform an operation to change the volume control from 0% to 100% by S-curve fading for a duration of 500ms.

exception

The piece of media may be close to the end of the play time. The slow fade can then indicate silence, this scenario is unlikely to be avoided, and if the service responds quickly, the transition should not be longer than 2 seconds.

If the user pauses and skips the media, no special rules apply (unless fade in is designed as desired or desirable).

Considerations

• Does the user-controlled UI 'master volume' have an impact on the individual player's ability to successfully perform volume control changes with sufficient granularity to ensure the quality of the X-fade?

That is, if the user has set the 'master volume' of the Web client service to 10%, is there enough granularity to change the volume control?

When a preview point, a point of interest, a point of time of preview, etc. are added, the system can be set for each service or user selection to skip to the most interesting media / song portion. From that point, the media will continue to play until the user skips to the next, jumps to the beginning, jumps to the point, or until a new media fragment is selected. The effect generated here is that the system plays the highlight part of the song and allows the user to continue playing from that point or play the song from the beginning, which makes it much easier to hear and decide the music It provides a much more powerful ability to do things faster.

sleaze Fade

In Figure 3, we can see that it takes 8 seconds to acquire a Data Distribution Service (DDS), fetch an audio stream / file, buffer it, and transition it.

The AN example of this use case is when the user clicks on a song that is not pre-fetched. If DDS occurs quickly, we will know the transition from slow fade to fast X-fade.

In Figure 4 we can see that it takes six seconds for a song that has not been prefetched to be retrieved and transitioned to fast transit.

Adaptive Audio X- Fade And Pre-fetching Additional examples of logic

5 shows a high speed X-fade of 2 seconds (2000 ms). The next track in the sequence is prepared and waits because it has prefetched the content. The fade is a linear fade and takes 2000 ms to complete. New media is played at 1000ms.

This creates a smooth transition and also obscures the blank audio in front of almost all audio files.

Figure 6 shows a pause and play X-fade using a 500 ms S-curve fade to achieve a responsive smooth experience.

Figure 7 shows two media plays that swap the same audio stream when a user jumps on the same media fragment. This is a 'seek' experience or jump to a position architecture.

The media player uses the fast 500 ms true x-fade (media transition at the run point) and uses the same power X-fade.

The user plays the same song but jumps between the points, perhaps checking their favorite bits, or simply checking that this is the song they want.

The slowest audio fragment on the right side of the image shows the user performing a skip back (starting the media from the beginning). In this example, X-fade is not used to resume the song.

It is a very common use case that the musician repeats the same song, and the user experience here is that the repetition of the song is delivered in a smooth and professional manner. This essentially contributes to the reproduction and enjoyment of music.

Appendix B: Media Player DJML Using

This appendix describes various use cases that are used in the preferred embodiment of the present invention, which is used to control the operation of the digital media player so that the DJML can provide an endless experience to the end user.

The following description is intended to provide only exemplary embodiments. That is, any given exemplary embodiment need not implement each single use case as described herein, and any given exemplary embodiment need not implement the example shown here exactly as described herein.

Player start

When opening a session / web player / software app, the audio is not played until the user makes a play action.

Could this be better?

Yes. The first part of the sound (intro sound) or song can be played.

This identifies the service.

Allows the user to know if the volume / headphone is working and set at the correct level.

solution:

Play very short (3 seconds) streamed or preloaded files at startup.

Play a gap or audio fragment that is used to brand the service.

The user selects a song clip to start the service.

• The service starts playing at the last play point they were in when they exited last (based on the final play status?).

Execution

• Start with automatic fade-in for 5-10 seconds (configurable as platform / service) (control volume from 0% to 100%).

Or 100% (depending on the gap with audio, or "baked in" fade in).

Skip user (forward and backward from play queue)

When the user skips to the next track of the play queue / sequence / lineup / album / playlist using the play control [>>] or [<<]

There are a few iterations of this use case as follows:

ㆍ Skip forward to the next track

ㆍ Skip forward to the track of the play queue beyond the next track

ㆍ Skip back to the last track

• Skip past the last track and skip back to the track of the play queue history

problem:

There is a delay while the next track is being requested, received, buffered and played, and so there is silence.

There is a small piece of blank audio in front of most audio files (between 0-500ms on average, about 400ms with 'guess').

• When a piece of music starts quietly (fades), this silence can be perceived much longer (up to a few seconds).

Could this be better?

ㆍ Yes. This is a breakthrough for eliminating this audio silence.

solution:

• Prefetch the next track.

• Prefetch multiple tracks (one back and three forward).

ㆍ Volume control and / or cross fade (additional audio player)

exception:

We can not prefetch outside the immediate play queue / sequence.

We can not prefetch from any other use case (ie, choose a song from somewhere else).

However, audio X-fade covers this scenario.

Execution:

• See selection in pre-fetch and audio X-fade logic (including the logic that arranges players during transitions).

User skips back to beginning of current song

When the user skips back [<<] while the song is playing or while it is paused, the song returns to the beginning. While playing the song for the first 5 seconds, this action will simply jump to the previous track of the play sequence.

problem:

There is a small (very small) delay and a potential harsh stop (if the beginning of the song has a gap in the audio).

There is a small delay here, and a very small delay is of concern because the stream is active and the player receives an active DDS from a content delivery network (CDN).

There is a situation where a large song can not be cached enough in the CDN, so a very long delay can be perceived.

Could this be better?

ㆍ Yes. This is good for keeping the overall audio experience the same by removing each use case, which is small but we can achieve it.

solution:

• Stream the beginning of the track back from the new player using the final (5th or Pre-4) player.

• 'Blend' transitions using fast (500ms) X-fade.

Execution:

Select User Change

When the user selects a piece of music from an investigation result, from a channel, from a playlist, from an artist selection, or from any other situation outside of a play queue / selection / lineup that has not already been prefetched.

problem:

Under this use case, rough stopping is experienced in all existing music services.

A delay is experienced while the client sends and waits for a request for the song.

This delay can be very large, depending on CDN availability, network access or user home bandwidth.

Could this be better?

ㆍ 100% Yes. Being able to smooth out this scenario by balancing the experience of mixing audio with delivery delay with X-fade is a breakthrough in music services.

solution:

Here we use the audio X-fade solution.

Focus on accelerating CDN performance.

Between Songs in Queue

When the song finishes playing and another song starts, there is a small gap while the next song is being fetched, buffered and played. The use case is as follows:

Between one unrelated song and another (playlist, channel, survey result, artist result, etc.)

• In a pre-existing sequence (album). Here, Segue needs to be considered.

problem:

There is a variable delay when fetching the next track in the list / play queue.

This varies between territories.

ㆍ This changes depending on the user's network performance.

This changes depending on the availability of the request cache state.

There is a large audio gap at the end (Tail) of occasional (2-7 seconds or more in some cases), and there is a small gap before a significant contribution to the perceived gap (Top).

• Songs on a cube do not mix perfectly with each other.

Could this be better?

ㆍ Yes. Optimization of platform performance, pre-fetching of the next track, and smoother transition covering the gap.

ㆍ Fit gapless playback to the three scenarios.

• Removing audio mute gaps at the beginning and end of a track.

solution:

• Prefetch the next track in the play queue.

• Potentially prefetch the next x tracks in the play queue.

To reduce the gap of the silence, data indicating the actual start time and end time of the audio in the audio file framework (silence before the track and silence of the track) are used.

• Use a fixed 5-second or variable user-set rate of X-fade to mix the transitions.

Faded out at the end of one song and driven by the start speed of the next track ... or using adaptive timed X-fade driven by tempo, genre, etc.

Execution:

• Flag music that is part of the collection on the cube, and ensure that when songs are played in sequence, keep the gapless playback by timing the two media players.

Find gaps at the front (front) and end (end) of the media catalog and reduce the silence.

• Ensure that audio X-Fade connects songs together by standard high-speed X-fade or by service / user-set X-fade (ie 1-12 seconds) when there is no cue.

Pause / Stop - Pause / Play

When the user pauses the media from the play state or when the media is paused (play) from the paused state.

problem:

Coarse stop is sometimes experienced, which becomes more of a problem when setting a large volume.

Could this be better?

ㆍ Yes. Smoother transitions add subtlety but make the service and user interface feel refined.

solution:

• Use fast and smooth volume control (fade out and fade in).

Execution:

ㆍ User pause operation reduces the volume from 100% to 0% for 500ms using S-curve shape. In the case of an operation from a pause to a play, the operation is reversed.

We have to allow a rollback scenario, which is when the audio play point in the play (resume from pause) action is -500 ms from the point where the action was initially paused. This compensates for missed music fragments and timing lost during pause processing (we should make this configurable when 1000ms is better suited for this).

quest( Fast forward And rewinding)

When the user "seeks" or jumps to the location of the piece of media being played (using the timeline or other mechanism), the media stream moves to the new location.

problem:

There is a small (sometimes insignificant) delay when stream data is requested at the new location.

There may be a large delay if the media is not ready on the CDN for a new location.

There may be sudden changes in listening experience.

Could this be better?

ㆍ Yes. This is good for keeping the overall audio experience the same by removing each use case if the silence is obvious, which is small, but we can achieve a much better user audio experience.

solution:

Stream the new media location using the final (5th or Pre-4) player.

• 'Shuffle' transitions using high-speed (500 ms) X-fades with S-curve shapes.

Execution:

Exceptions or Considerations:

An alternative (pre-4) player made available for this operation may need to request the same stream to be ready to play the new media at the new location (we do not need this and the alternative (Free- ) The player found it was only needed to request an active stream).

The end of the play queue

At the end of the play queue or sequence, the media will stop playing (when not in repeat mode).

problem:

There is no problem. This is expected.

Could this be better?

ㆍ Yes. Background sounds or songs can be played.

This identifies the service.

Let the user know that the sequence has been stopped and the user wants to select more media.

solution:

Play a very short (3 seconds) streamed or preloaded file after the media sequence ends.

• Play the gap or audio fragment used to stigmatize the service.

error

When there is a system error due to any of the following situations (or something not yet defined), an uncontrollable or unexpected situation arises. In the world of information or visual demonstration, we sometimes have notifications or feedback about what is happening. This is currently missing in the world of audio.

The system / service can be recovered in a few seconds. The following solution allows the client to retry.

ㆍ Server maintenance.

ㆍ CDN error

ㆍ File error.

ㆍ ISP error.

ㆍ Bandwidth problem

Delay in CDN / Server.

ㆍ User connection problem.

The playing media does not cache / buffer the entire song when any of the above occurs.

problem:

ㆍ The media stops. That is, the sequence fails to complete, or the media currently playing is stopped midway due to an error.

Could this be better?

ㆍ Yes. We can expect media delivery errors because there is enough buffered media to run the solution.

• Allowing users to hear that a problem is a better experience than silence.

The error gap can be stigmatized or sufficiently polite customized audio that destroys bad news for the user.

solution:

Play preloaded gaps.

• Audio fades out the audio prior to the final buffered audio before triggering the X-fade to the error state gap.

A back fade-in will occur when a sufficient buffer is filled to resume playback.

• Refer to the Prefetch section, which allows a similar system for long wait times of 10 seconds or more.

exception:

Avoid situations such as 'play' where media is nearly reached at the end of the buffer (prior to its natural ending point) triggering a fade-out and then resuming the iteration event by a bad connection or problem Some logic may be needed for this. To avoid this, it is proposed that a longer buffering time be set before resuming in a single media (doubling the time to wait for the buffer to fill each time this occurs).

Appendix C: Digital Media Content Navigating Method, system and computer program product

summary

According to a first aspect, there is provided a method of providing a user interface to an end user to assist in the investigation, browsing and / or navigation of digital media content,

The method can also include:

Hooking "to the end user so that the end user can use the" hook "to inspect, browse and / or navigate digital media content.

Providing integrated sound as a background to hide any silent holes or gaps in playback.

A method in which the integrated sound is played in the background to hide any silence holes or gaps in playback and / or to provide a coherent aural cue in which the audio user interface is in operation.

A method wherein the integrated sound consists of a hum, a crackling sound, a white noise, a sound of a listener, a station identifier, or any other audio and / or video content.

A method in which a "hook" is composed of one or more extracted sections of a track of audio and / or video content identified as: (i) a representation of the track as a whole; Or (ii) the best recognizable portion of the track; Or (iii) the "best" part of the track as specified; Or (iv) associated with one or more portions of another track, including but not limited to, portions of a track that are similar to portions of another track, such as a track starting in a similar manner; Or (v) recalling the specified track; Or (vi) a combination of one or more of the listed criteria.

"Hook" is identified manually or in any other way using one or more digital signal processing ("DSP") techniques.

"Hook" is composed of one or more hooks from one or more tracks ("hooks per track"). These individual hooks are combined to form a single hook by one or more of cross fading, parallel placement, or any other technique for combining digital media content.

Wherein the decision as to how to utilize when combining hooks per track with a single hook is determined by DSP analysis of individual tracks and / or hooks per individual track.

A method in which a collection of tracks is previewed by reproduction of a hook to the set. The hook is created by combining hooks per track of the tracks making up the set of tracks.

The set of tracks consists of a track in a playlist formed according to the grouping of metadata or any other tracks, a set of search results, a group of tracks.

The metadata used to form a group of tracks may include an artist, actor, genre, year of publication or republishing or creation of the track, an album or album in which tracks or tracks appear, popularity or track within a group of predetermined service users , Or any other metadata recorded for a track.

The playback of the hooks for the set of tracks or tracks is triggered by an action performed by the service user.

The operation performed by the service user while the current track continues to play triggers the playback of the per-track hook of the following track, whether or not the currently playing track subsequently restarts again or by some other means. Where the hook is fade-in and then fades out while the current track continues to play, or the current track is paused during playback of the hook, or the track currently being played is hooked for the duration of the hook, Lt; / RTI >

The decision on how to play the hook may be made using DSP processing to determine the volume at which the hook is played, and / or to clearly listen to the hook without being interrupted, according to parameters specified for the particular service, device or user &Lt; RTI ID = 0.0 > and / or < / RTI >

A method in which an operation performed by a user of a service during the playback of a hook is capable of triggering playback of a track from which a hook per specific track is derived.

The playback of the track starts at the beginning of the track, at the point of the track from which the hook was extracted, or from any other point.

The action may be a mouse click in a graphical user interface element, a tab in a specific area of the touch sensitive interface, a specific keyboard command, a specific voice command, a specific gesture identified through a mouse, touch sensitive or motion sensitive interface, Lt; RTI ID = 0.0 > machine-recognizable < / RTI >

A hook for a set of tracks and / or tracks is played as background while the user browses the set of tracks or tracks.

Playing audio and / or video content in the background is a non-limiting example of cross fading between hooks, including hooks per track; Or playing the hook with a volume lower than the normal volume; Or playing a hook using a 3D audio effect technique such that the sound appears to originate from a specific location, e.g., from the back or side of the listener; Or any other method or combination of methods designated as indicating that the hook is played in the background.

A method of browsing a collection of tracks or tracks by browsing hooks to a collection of tracks or tracks in addition to or instead of browsing through a graphical and / or textual interface by a user of the service.

In addition to playing hooks, a way in which an audio narration allows an individual to see or be seen by an invisible or partially visible person by replacing or augmenting any or all of the other visual elements of the graphical interface.

The method is for providing an audio user interface ("AUI") to an end user.

A "hook" includes an audio "hook ".

A method is applied to a system including a display, a speaker and a computer, the computer being configured to display a graphical or textual representation of the digital media content on a display, the computer using the display and / or the speaker to &Lt; / RTI >

A method wherein the display comprises a touch screen.

The method being a personal portable device.

The personal portable device is a mobile phone.

The system comprises a microphone and the computer is arranged to receive a voice input via the microphone.

The system being operable to receive a user selection of digital media content.

The digital media content is digital music content.

The digital media content is digital video content.

The digital video content is a movie, a television show or a computer game.

According to a second aspect, there is provided a system including a display, a speaker and a computer system, wherein the computer system is configured to display a graphical or textual representation of digital media content on a display, the computer system using the display and / There is provided a system configured to output a "hook" relating to digital media content, the system being operative to provide a user interface to an end user to assist in searching, browsing and / or navigating digital media content,

(a) analyze digital media content to create a "hook" on digital media content or to search for "hooks &

(b) replace or enhance the graphical or textual representation of the digital media content with the "hook ".

The system may be operable to implement the method according to the first aspect.

According to a third aspect, the method may be embodied in a non-transitory storage medium or a cellular mobile telephone device or other hardware device, and performs a method of providing a user interface to an end user to assist in the investigation, browsing and / or navigation of digital media content There is provided a computer program product operable to perform the steps of:

The computer program product may be operable to implement the method according to the first aspect.

A mechanism is disclosed herein for providing an audio user interface ("AUI") to the end user so that digital media content can be navigated without being entirely dependent on the graphical mechanism.

For simplicity, the AUI disclosed herein is provided in connection with an audio interface for navigating a music catalog. However, techniques similar or identical to those described below may also be used as additional exemplary embodiments of this appendix to provide an interface for navigating a video catalog such as a movie, television show, or computer game or any other suitable digital media content . &Lt; / RTI >

Details

Audio user interface

Some elements of the audio user interface are described below. Any such single element may suffice alone to construct the embodiment of this appendix, but the preferred embodiment utilizes each element described below.

hook

The core component of the AUI ("Audio User Interface") is a key component of the "hook".

"Hook" refers to a piece of audio, video, or both, identified in fragmentary digital media content that is representative of the content, in terms of invoking the content or in terms of a particular distinguishable or recognizable zone of content to be.

For example, the opening bar of Beethoven's fifth symphony is considered an identifiable "hook" for the piece and may be a special repeating passage or other sequence from a short segment of speech or a popular music track Lulu's cry of "Weeeeeeelllllll" at the beginning of "Shout", or a particular repeat section from the middle of "Michael Thriller" of Michael Jackson) Quot; hook ". Similarly, a sequence recorded from one or more scenes or computer games of a movie or television show may be identified as a "hook" for an item of digital media content (an example of such video & Can be found in the trailer).

A variety of methods for identifying such "hooks " may already exist or may be used for automatic detection of hooks via DSP (Digital Signal Processing) technology, whether or not developed or customized for use in concert according to the examples of this appendix, It exists in existing technologies, including both manual identification.

However, a fragment of a given digital media content identified may characterize one or more "hooks ", which may then be utilized in an audio user interface (AUI).

Hooks are typically short snippets of audio / video content with durations of 10 seconds or less, and in preferred embodiments, durations of about 1-6 seconds.

Figure 10 shows a graph showing a number of hooks identified and graphically displayed. In this example, point 1 represents the beginning of the voice, point 2 is the identified repeat section that invokes the tenor of the fragment, and point 3 is the section of memorable content. How each hook was identified in Fig. 10 is not important for the purposes of the present invention. It is important to note that such hooks can be identified for use within the AUI, regardless of whether they are automatically marked as points in the track or displayed manually.

The hooks in the digital content file may be identified by identifying portions of the digital content file that have the greatest change in tempo, volume, musical key, frequency spectrum content, for example, as would be apparent to those skilled in the art, .

Hooks Browsing the used track set

A playlist, a set of search results, a channel (e.g., as disclosed in WO2010131034 (A1), which is incorporated herein by reference), a track preferred by a given user or group of users, Or a group of tracks, such as a disc, a disc, a disc, an artist's discography (wholly or partially), a track selected by the user, a recently released track, a track to be released, or any other track, Lt; / RTI > can be browsed according to the example of the present invention by triggering the regeneration.

In the preferred embodiment of this appendix, the set of tracks can be "previewed" by continuously playing the hooks of each of its constituent tracks.

Each such hook may then be cross faded in one exemplary embodiment to form an apparently seamless audio sequence that provides a clear indication of the nature of the track set. In another exemplary embodiment, the hook is simply continuously played without gaps between the hooks and without cross fading. In another exemplary embodiment, the hooks are typically played continuously with a very short duration gap between each hook. In a preferred embodiment, the DSP processing of each hook is used to identify which transition or "crossfading" technique is utilized in each case.

In a preferred embodiment, the user experience is achieved by moving the mouse cursor over the playlist (or by moving a finger in the case of a touch interface; by voice command in the case of a voice interface or by any other triggering mechanism described below) ) And thereby trigger playback of the hooks for the tracks in the playlist, and each hook then cross fades to provide the user with an overall "feel" of the content of the playlist. At any point, commands such as a single tap or double tap of a "play" control can be used to trigger playback of the entire playlist or a particular track associated with the hook currently being played. The details of such an order are also described later.

If the set of tracks is browsed while the track is being played, the set of "hooks " are handled in the same manner as the hooks for the individual tracks, in the preferred embodiment, using the techniques described below.

Hooks Browsing Tracks

Browsing of tracks from within the audio user interface (AUI) relies on the use of hooks to provide the user with a usable queue for the nature of the audio content being browsed.

In a conventional graphical user interface (GUI), a group of tracks, such as a track to be published, a selected track or an investigation result, can be browsed by navigating a list of track titles or artwork. However, the interface does not provide any queue for the nature of the track to be released. That is, in order to check how the track sounds, it was necessary to explicitly play the track with respect to the point at which the track or the style became recognizable.

In contrast, the AUI can check tracks that are about to be released, even while listening to the track currently being played, if desired. In the preferred embodiment, this is accomplished by fading down the track currently in play (if any) and fading to the hook of the track to be released before fading back to the track currently being played ("cross fading" between track and hook) In reverse). In a preferred embodiment, the "cross fading" is performed using techniques disclosed in Omnifone's patent applications GB1118784.6, GB1200073.3 and GB1204966.4, which are incorporated herein by reference Lt; / RTI >

By using only the hooks of the tracks to be released, the "flavor" - mood, genre, tempo, suitability, etc. of the track - can be sampled by the user without having to listen to the entire track. Because such sampling is performed audibly rather than simply by viewing the track title, artwork, or textual description thereof, the user can more easily determine whether he wants to listen to the entire track, even if he or she did not listen to the track before .

In another exemplary embodiment, the track currently in play (if any) is effectively paused while the "hook" of the track to be released is played and restarted after the hook is played. In another exemplary embodiment, the hooks are simply inserted instead of the currently playing track without being cross-faded. In yet another exemplary embodiment, the track currently being played continues playing and the hook is either cross faded or played at a different volume, or using any other technique that is distinct from the hook from the track currently being played It is played simultaneously with the track.

In another exemplary embodiment, the technique used to play the hook is dynamically selected based on the digital signal processing of the track and hook currently playing. In the latter case, the large hook played in the quiet segment of the track currently being played can be played more quietly, and the track currently playing does not reduce the volume and vice versa, i.e., played during a large section of the track currently being played Quiet hooks, in one exemplary embodiment, can reduce the track volume by cross fading or otherwise when the quiet hook is played.

In the preferred embodiment, if there are no tracks currently playing, the hooks may be played directly, and in the preferred embodiment, each hook may be cross-faded to cross fade next. In another exemplary embodiment, such a cross fade does not occur and each hook is simply played continuously.

Select tracks from a set of tracks

In a preferred embodiment, a user-initiated trigger may be used at the AUI when the hook is played to cause the track from which the currently playing track is derived to be played.

In one exemplary embodiment, the user-initiated trigger is a generic button, such as a "play" button on a GUI or control panel. In another exemplary embodiment, the trigger is a voice command, an eye movement, or a visual gesture. In another exemplary embodiment, the trigger is a mouse cursor movement on a visual indicator. In another exemplary embodiment, the trigger is configured with a mouse or finger action at an item of the user interface. In a preferred embodiment, the appropriate triggers are accessible according to available hardware and configured user or system preferences.

When triggered to play, the preferred embodiment plays forward the remainder of the track from the "hook" section and skips the play of the previous portion of that track ("Action A"). In another exemplary embodiment, the trigger causes the track of the hook to play from the beginning of the track ("behavior B"), regardless of whether there is cross fading from the hook to the beginning of the track. In another exemplary embodiment, the behavior can be configured by the user, for example, by setting user preferences for behavior A or behavior B. [

In a preferred embodiment, behavior A occurs when a play button is clicked, and behavior B occurs when a same button is clicked twice. In another exemplary embodiment, the user can select between behavior A and behavior B using any other mechanism.

Track browsing

In a preferred embodiment, if there is no track currently playing and the user nonetheless browses through a track or sequence or track, such as a playlist, the hook of the browsed digital media item is played back in the background. In the preferred embodiment, the "background" refers to a volume that is as low as playing audio normally and / or reproducing video that is partially transparent or otherwise non-intrusive, and / indicating the use of the 3D audio effect technique to place the origin at a specific point, e.g. behind or next to the listener. In another exemplary embodiment, "background" does not affect the volume or transparency of the hooks for the track being browsed or the apparent spatial origin of reproduction.

Browsing a set of tracks and tracks can, in one exemplary embodiment, cause the end user to cross-fade in synchronism with the movement of the cursor by moving the mouse cursor or fingers between icons representing the track or track set And trigger playback of the hooks of the tracks. In another exemplary embodiment, eye tracking can be used to control cursor movement through the interface. In another exemplary embodiment, the cursor is controlled by voice commands or by other mechanisms such as using the tilt control of the motion sensitive device.

In a preferred embodiment, during browsing, the user can select a track to be played entirely in the same manner as described above, for example by pressing "play " while a particular hook is being played.

In that case, in the preferred embodiment, the track associated with a given hook will be the track currently being played and all other actions of the AUI will continue as described above.

Slide Show Accompaniment

In one exemplary embodiment, the hooks for the tracks are gathered together based on any preset criteria such as mood or genre, and are played as background music with their own rights. In another exemplary embodiment, the images are similarly selected using the same or similar criteria, whether still or moving, or using other criteria in another exemplary embodiment.

The phases and sequences of the music hooks are then played simultaneously to form a background slide show with audio accompaniment.

In a preferred embodiment, the pre-selected set of images is analyzed by the DSP to determine its overall "mood" or other desired style, and a sequence of audio hooks with similar moods is generated again via DSP identification, It forms an audio accompaniment.

Alacart (a la carte ) purchase

In a preferred embodiment, the reproduction of each hook is accompanied by a link or button through which the user may purchase the right to play the track associated with the hook in the user's one or more media player devices.

Integrated sound

In a preferred embodiment, a low level of background sound, such as a buzzing sound or a weak beating sound, is used via the AUI to hide any silence holes or gaps in playback and / or to provide a constant oracular queue in which the AUI is in operation do.

accessibility

By providing an audio interface, the AUI provides greater accessibility to invisible or partially visible users.

In a preferred embodiment, user interface components that are visible and can not be replaced by the AUI as described above involve mark-ups so that they can be rendered using voice narration and / or braille display. Further, in a preferred embodiment, any such audio narration is treated as "currently playing track" for the purposes of the present invention described above, and playback of the hook is performed in such a manner that the narration continues to hear clearly. For example, by allowing the hook to play "in the background", the audio narration can be lowered during browsing and / or during playback as described above.

week( note )

Claims

A method of managing playback of one or more items of digital media content to ensure, for example, natural transition between items of digital media content,
(a) identifying a description defining how to manage playback of one or more items of digital media content, the description comprising descriptive metadata; and
(b) utilizing the description in the digital media player to automatically control playback of the digital media content.

2. The method of claim 1 wherein the description of a particular item of digital media content includes metadata identifying an important event or characteristic of the item and wherein the digital media player automatically Wherein the control means controls the playback control means.

3. The method of claim 2, wherein the description of a particular item of digital media content is a timeline description that identifies when an important event of the item occurs in time or a location of the important event.

2. The method of claim 1, wherein the description metadata for the digital media content file is a start point of actual content in the file; The end point of the actual content in the file; A region or regions of a file constituting a vocal; The tempo of the media content; Mood of media content; The pitch of the media content; "Hook" in the content; Suitable fade in and fade out points; The position of any chorus in the file; The location and type of any beat point in the file; Any overlay location where other content may be overlaid on the digital media content during playback; And any other metadata associated with controlling playback of the digital media content file.

5. The method of any one of claims 1 to 4, wherein the descriptive metadata for the digital media content file is identified or manually identified by applying digital signal processing (DSP) techniques to the digital content file, Of the reproduction management information.

6. The method according to any one of claims 1 to 5, further comprising the step of generating a description stating how to manage playback, the step being performed automatically or by a tool or tool created for that purpose Or performed manually or by a combination of the methods listed above.

7. The method of any one of claims 1 to 6, wherein the description of how to manage the playback includes, but is not limited to, a starting point of actual content in the file; The end point of the actual content in the file; A region or regions of a file constituting a vocal; The tempo of the media content; Mood of media content; The pitch of the media content; "Hook" in the content; Appropriate fade-in and fade-out points; The position of any chorus in the file; The location and type of any bit point in the file; Any overlay location where other content may be overlaid on the digital media content during playback; And any other metadata associated with controlling playback of the digital media content file. &Lt; Desc / Clms Page number 13 >

8. The method of claim 4 or 7, wherein said "hook" comprises at least one extracted section of a track of audio and / or video content, said at least one extracted section comprising: (i) a representation of said track as a whole; Or (ii) the best recognizable portion or portions of the track; Or (iii) the "best" parts of the track as prescribed; Or (iv) associated with one or more portions of another track, including but not limited to, portions of a track that are similar to portions of another track, such as a track starting in a similar manner; Or (v) recalling the specified track; Or (vi) is listed as a combination of one or more of the listed criteria.

The method of claim 4, 7, or 8, wherein said "hook" is identified using one or more of digital signal processing ("DSP") techniques, manual or any other method.

The system of claim 4, 7, 8, or 9, wherein said "hook" includes one or more hooks from one or more tracks ("hooks per track" Wherein said first hook is coupled to form a single hook by at least one of cross fading, juxtaposition, or any other technique for achieving the desired performance.

11. A method according to any one of claims 1 to 10, wherein the description of how to manage the playback includes information about one or more recommendations of a requirement on how the digital media content file is cached in the client device ; "Fallback" digital media content that may be played in place of the digital media content file if the digital media content file becomes unavailable for some reason; A recommendation or requirement of what digital media content should be played after the digital media content file; How to apply some audio and / or video processing, what initial volume to use for playback, how to apply normalization of tracks, or any other playback criterion; Overlaying one track on another track selectively or otherwise, such as defining a commentary track of audio, video, or text for demonstration with the currently playing track; Managing playback including information relating to how to control the tempo and / or pitch of digital content during playback; Any other type of sound processing to use during playback, such as one or more of effects, equalization, volume normalization, compression, or any other audio and / or video processing; Managing a demonstration of digital media content for an end user in a user interface of a client; And any other metadata associated with controlling playback of the digital media content file.

12. The method according to any one of claims 1 to 11, wherein the description of the method for managing the reproduction includes: a start and end time of a transition in the first file; The start and end time of the transition "end point" in the second file; What transition effect or combination of transition effects to use; A duration for applying any of the transition effects; What gap (if any) to use when transitioning from the first digital media content to the second digital media content; And managing the transitions between two or more items of digital media content, including at least one of any other metadata useful for defining transitions between digital content files. &Lt; RTI ID = 0.0 > , Regeneration management method.

13. The method of claim 12, wherein the transition effect is selected from the group consisting of linear, s-curve or parametric fading, fade-to-hold, fade-to- Including one or more of slow fade, cross fade, fade cross fade, timing of effect, duration of effect, or any other information related to the application of a given transition effect In reproduction management method.

14. The method of any one of claims 1 to 13, wherein the automatic generation of the description comprises generating a description in any standardized format such as Extensible Markup Language (XML), JavaScript Object Notation (JSON), or any other applicable format Is performed by a software application that uses the identified description metadata for the content to generate an indication of an item of digital media content.

15. The method of any one of claims 1 to 14, wherein the description defining how to manage the playback describes a sequence of one or more items of digital media content, Wherein the method further comprises defining a method for managing transitions between items.

16. The method of claim 15 wherein the description is from a list of digital media content files manually or automatically provided to produce the description in any standardized format such as XML, JSON, or any other applicable format, or from the list of digital media content files Is generated using a software application that generates the above description from the excerpt of the file.

17. The method of any one of claims 1 to 16, wherein the digital media content file itself includes a description defining a method for managing playback of one or more items of digital media content.

18. A method according to any one of claims 1 to 17, wherein the digital media content file comprises one or more digital media files and / or one or more abstracts from two or more digital media files.

19. A method according to any one of claims 1 to 18, wherein the description of how to manage playback of the digital content is based on at least one of: < RTI ID = 0.0 > and / or & Is used by a digital media player to control playback of digital media content either directly or indirectly or by a plug-in to a digital media player for the purpose of yielding a seamless playback experience for an end user. .

20. The method according to any one of claims 1 to 19, wherein the digital media content is digital music content or digital video and audio content.

21. The method of any one of claims 1 to 20, wherein the digital media player is a smartphone or tablet computer.

22. A method according to any one of claims 1 to 21, wherein the description metadata further includes a description of the end of the file in that it identifies a portion of the digital media file that has little or no valid audio content in the digital media file Lt; RTI ID = 0.0 > a < / RTI > audio end.

24. A method according to any one of claims 1 to 22, wherein said description metadata comprises a beginning of an audio element of an audio file.

24. The system according to any one of claims 1 to 23, wherein the description metadata comprises general definitions; Instructions for the cache; Fallback playlist; Streaming playlists; And links for requesting more playlist items. &Lt; Desc / Clms Page number 24 >

25. The method of any one of claims 1 to 24, wherein the description metadata includes information about which track to play; At which point to start playing each track; At which point the playback of each track will end; How to play each track in relation to what audio and / or video processing to apply, such as the initial volume to use for playback, how to apply normalization of tracks, or any other playback criterion; How to cross fade between tracks and how to use any gap (if any) to smooth transitions; and how to transition from and to each track; What track to play after a given track as a set of selection criteria that the client application can use to select from a selection of possible "next tracks " or as a simple track identifier; Handling the case where the "next track" is temporarily or permanently unavailable, such as providing a pre-cached track for alternate use; Managing the demonstration of a track for an end user in a client's user interface; And overlaying one track on another track, selectively or otherwise, such as defining a commentary track of audio, video or text for a demonstration with the track currently being played Wherein the information includes information that is available to the user.

26. The method of any one of claims 1 to 25, wherein after opening a session / web player / software app in the digital media player, the audio is played by a user-instigated play action only in response to the action.

A method for analyzing digital content,
(a) identifying a collection of digital media files;
(b) performing a DSP analysis on the collection of digital media files to automatically generate audio start and end points within the digital media file;
(c) generating and storing metadata based on the DSP analysis.

28. The method of claim 27, further comprising performing a DSP analysis on the digital media file to automatically identify the tempo and mood of the music in the digital media file.

29. The method of claim 27 or 28 further comprising: automatically identifying a potential overlay point (where audio can be overlaid on the file), or automatically identifying a "hook & Further comprising performing a DSP analysis on the digital media file to automatically identify additional metadata that can be automatically derived from the analysis.

In a collection of digital media content files,
A collection of digital media content files, including a related description defining a method of managing playback of one or more items of digital media content, the description including descriptive metadata.

31. The collection of digital media content files of claim 30, comprising one or more gap files.

15. The software application of claim 14,
Description of items of digital media content using descriptive metadata identifying characteristics or characteristics of digital media content in any standardized format such as XML, JSON, or any other applicable format - , Which is used by the digital media player to control the software application.

The format of the output from the software application of claim 14.

17. The software application of claim 16.

The format of the output from the software application of claim 16.

A system, comprising: a digital media player and a content server, wherein the digital media player is connectable to the content server via a content delivery network, the content server being responsive to a call from the digital media player to the content server, The system being operable to provide content transfer for a digital media player,
(a) a description defining a method for managing reproduction of one or more items of digital media content, the description comprising descriptive metadata,
(b) use the description in the digital media player to automatically control playback of the digital media content.

37. The system of claim 36, wherein the digital media player is operable to identify a description defining how to manage playback of one or more items of digital media content, the description including descriptive metadata.

37. The computer readable medium of claim 36, wherein the content server identifies a description defining a method for managing playback of one or more items of digital media content, the description comprising descriptive metadata, The system comprising:

A system comprising a digital media player, an identification server and a content server, wherein the digital media player, the identification server and the content server are connectable to each other via a content transmission network, And to provide content delivery to the digital media player in response to a call to the server,
(a) the identification server is operable to identify a description defining a method for managing reproduction of one or more items of digital media content, the description including descriptive metadata,
(b) the identification server is operable to transmit the description to the digital media player,
(c) the digital media player is operable to use the description to operate to automatically control playback of the digital media content.

40. A system according to any one of claims 36 to 39, operable to implement the method of any one of claims 1 to 29.

40. A digital media player forming part of the system of any one of claims 36-40.

40. A content server forming part of the system of any one of claims 36-40.

39. An identification server forming part of the system of claim 39.

A computer program product operable to perform a method of managing playback of one or more items of digital media content to ensure, for example, natural transition between items of digital media content,
(a) identifying a description defining how to manage playback of one or more items of digital media content, the description comprising descriptive metadata;
(b) using the description in the digital media player to automatically control playback of the digital media content.

45. The computer program product of claim 44, operable to implement the method of any one of claims 1 to 29.