WO2007096691A2

WO2007096691A2 - Generating a representation of a dancer dancing to music

Info

Publication number: WO2007096691A2
Application number: PCT/IB2006/000906
Authority: WO
Inventors: Mikko Heikkinen; Kari Laurila
Original assignee: Nokia Corporation
Priority date: 2006-02-21
Filing date: 2006-02-21
Publication date: 2007-08-30

Description

TITLE

Generating a representation of a dancer dancing to music

FIELD OF THE INVENTION

Embodiments of the present invention relate to generating a representation of a dancer dancing to music. In particular, they relate to methods, systems, devices and computer programs for generating a representation of a dancer dancing to music .

BACKGROUND TO THE INVENTION

Music players may have displays that show abstract animations that look interesting e.g., Windows media player and Winamp come bundled with several such animations. The problem with these animations is that while they look interesting they are abstract in nature and don't closely relate to the music.

Music downloading is a growth business, but it is extremely competitive. It would therefore be desirable to increase the value associated with a music download and/or music player so that they are more desirable and consequently more valuable.

DEFINITIONS

The term 'dancer' is used in this document to mean a articulated (jointed) object that moves with the rhythm of music. The term 'human dancer' is used to mean an animated person that moves with the rhythm of music. A 'dancer' may be a 'human dancer' or some other type of dancer such as an animal or machine.

BRIEF DESCRIPTION OF THE INVENTION

According to one embodiment of the invention there is provided a method of generating a representation of a dancer dancing to music having beats, the method comprising: storing a first data structure that defines movements of a dancer; storing a second data structure that defines an appearance of a dancer; and generating, with successive beats of the music, an image representing a dancer wherein the first data structure determines a pose of the represented dancer and the second data structure determines the appearance of the represented dancer.

Although an image representing a dancer is generated with successive beats of the music, image generation may occur more frequently.

The phrase 'generating, with successive beats of the music, an image' encompasses the generation of an image at a beat but also generation of an image close to a beat. Close in this context means at the first point immediately preceding a beat or immediately following a beat at which an image is generated according to a predetermined schedule.

The first data structure may have a standard format that enables the exchange of one first data structure with another first data structure to change the movements of the represented dancer.

The second data structure my also have a standard format that enables the exchange of one second data structure with another second data structure to change the appearance of the dancer representation.

The first data structure may be portable. The second data structure may be portable. The first data structure and the second data structure may be independently portable.

The first data structure may be editable by a user. The second data structure may be editable by a user. The first data structure and the second data structure may be independently editable by a user.

The first data structure may comprise structured data specifying, for each one a plurality of time instances, the position in space of each of a plurality of predetermined dancer body parts at the time instance. The time instances may be regularly spaced or may have variable spacing. The music may have a music tempo defined by its beats. The first data structure may define a plurality of poses adopted at first intervals by a dancer as the dancer dances with a dance tempo. The generation of a new image representing a dancer may occur after each second time interval, where the second interval is the first interval scaled by the ratio of the music tempo to the dance tempo. The first interval may be regular or variable.

The generated images may be dependent upon a telephone caller identifier. The generated images may be dependent upon ambient music. Ambient music means music that is present in the surroundings of the system generating the representation of a dancer that has not been produced by the system.

According to one embodiment of the invention there is provided a system for generating a representation of a dancer dancing to music having beats, the system comprising: a memory storing a first data structure that defines movements of a dancer and a second data structure that defines an appearance of a dancer; and a generator for generating, with successive beats of the music, an image representing a dancer wherein the first data structure determines a pose of the represented dancer and the second data structure determines the appearance of the represented dancer.

According to one embodiment of the invention there is provided a computer program for generating a representation of a dancer dancing to music having beats, comprising program instruction which when loaded into a processor provide: means for accessing data within a first data structure that defines movements of a dancer; means for accessing data within a second data structure that defines an appearance of a dancer; and means for generating, with successive beats of the music, an image representing a dancer wherein the first data structure determines a pose of the represented dancer and the second data structure determines the appearance of the represented dancer. Although an image representing a dancer is generated with successive beats of the music, image generation may occur more frequently.

According to another embodiment of the invention there is provided a method of generating a representation of a dancer dancing to music having a music tempo comprising: storing data that defines a plurality of poses adopted at first intervals by a dancer as the dancer dances with a dance tempo; and generating, using the stored data, a new image representing a dancer after each second time interval, wherein the second interval is the first interval scaled by the ratio of the music tempo to the dance tempo

According to a further embodiment of the invention there is provided a method of generating a representation of a dancer dancing to music comprising: storing data that defines a plurality of poses adopted by a dancer as the dancer dances; and generating, using the stored data, a series of images representing a dancer wherein the generated images are dependent upon detected environment information.

The detected environment information may be a telephone caller identifier or ambient music, for example.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the present invention reference will now be made by way of example only to the accompanying drawings in which:

Fig. 1 schematically illustrates a system for controlling a computer generated visualization of a dancer; and

Fig. 2 schematically illustrates a method of generating a representation of a dancer dancing to music.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Figure 1 schematically illustrates a system 10 for controlling a computer generated visualization of a dancer. The system comprises: a processor 2, a display 8 and a memory 4 storing computer program instructions 6, a dance model database 12 and a dancer model database 14.

The processor 2 is arranged to write to and read from the memory 4 and to control the output of the display 8.

The computer program instructions 6 define a dancer visualization software application. The computer program instructions 6, when loaded into the processor 2, provide the logic and routines that enables the system 10 to perform the method illustrated in Fig 2.

The computer program instructions 6 may arrive at the electronic device via an electromagnetic carrier signal or be copied from a physical entity such as a computer program product, a memory device or a record medium such as a CD-ROM or DVD.

The system 10 will typically be part of an electronic device such as a personal digital assistant, a personal computer, a mobile cellular telephone, a personal music player etc.

The dance model database 12 stores a plurality of independent dance models as independent data structures 13.

A dance model defines a 'dance clip' which is an ordered sequence of poses adopted by a dancer. A pose can be defined using a dance vector v to specify the position in space of each of a plurality of predetermined dancer body parts. An ordered set of dance vectors {v: v(1), v(2), v(3)....v(m) } that has a vector v(i) for each one a plurality of regularly spaced time instances / is used to define the 'dance clip'. The dance model also records information about the rate at which poses change (the dance tempo DT) and information about the time interval separating sequential vectors in the ordered set of vectors (sample rate SR).

A dance model data structure 13 is schematically illustrated in Fig. 2. It typically comprises the following fields: a) a dance tempo (DTo) field, that defines the tempo of the dance clip b) a dance length (DL) field that specifies the length of the dance clip c) a movement field for the set of vectors {v} d) a sample rate field for the sample rate (SR).

If there are m vectors in the set of vectors {v} then m * SR = DL. It should therefore be appreciated that if m is known or can be determined, only one of DL and SR is required.

The dance model data structure 13 may also comprise additional fields that, for example, identify musical start and end times or contain musical time signature information, dance style information or music style information.

It should be appreciated that a common standard 'skeleton' model is used as a basis for each dance model. That is, the same predetermined body parts are specified by the components of a vector v(i) in different dance models i.e. the vectors v of each dance model have the same format and span the same vector space. It should also be appreciated that is not necessarily essential to provide a value for every component of a vector v(i) as it may be possible to interpolate or generate components.

Different articulated objects will require different common standard models, that is , a common standard model is required for each different 'skeleton'. For example, a human dancer will have a different skeleton, to a dog dancer, which will have a different skeleton to an earth excavator dancer. It is therefore desirable to use variable standard skeletons, where the dance model indicates which one of the standard skeletons is being used.

The standard skeleton model enables a dancer's movement to be determined by selection of an appropriate dance model data structure 13. The exchange of a current dance model data structure for another different dance model data structure changes the dancer's movement but does not affect the dancer's appearance. Each dance model data structure 13 can be transferred independently into and out of the database 12 as illustrated by transfer block 30 in Fig. 2. A data structure 13 can, for example, be downloaded from a web-site, uploaded to a website, transferred from one device or storage device to another etc. Each dance model data structure 13 and therefore each dance model is therefore independently portable.

A new dance model can be created by a user by creating a new dance model data structure 13 and storing it in the dance model database 12. For example, the dance vectors can be created using motion capture hardware and software or alternatively generated with a computer program.

Also, an existing dance model may be varied by editing the existing dance model data structure 13 for that dance model and saving the new data structure in the dance model database 12. This functionality is schematically illustrated in Fig. 2 by block 32.

The dancer model database 14 stores a plurality of independent dancer models as independent data structures 15.

A dancer model defines the visual appearance of a dancer.

The dancer model determines the body generated around the moving skeleton, defined by the dance model. The body may be generated in any one of a number of standard fashions such as using interconnected polygons. As the 'skeleton' is a common standard model, it is possible to use any dancer model with any dance model. This allows one to 'mix and match' any dance model with any dancer model.

A particular dancer model is associated with a particular standard skeleton. If there are a number of variable standard skeletons, then the dancer model should identify the skeleton type it is suitable for so that dance and dancer models can be matched.

A dancer model data structure 15 may typically define one or more of the following attributes of a dancer: male/female; ethnicity; dress style; facial characteristics. It should be appreciated that a common standard 'appearance' model is used as a basis for each dancer model. That is, there is a sematic convention for specifying the dancer attributes.

The standard appearance model enables a dancer's appearance to be determined by selection of an appropriate dancer model data structure 15. The exchange of a current dancer model data structure for another different dancer model data structure changes the dancer's appearance but does not affect the dancer's movement.

Each dancer model data structure 15 can be transferred independently into and out of the database 14 as illustrated by transfer block 40 in Fig. 2. A data structure 15 can, for example, be downloaded from a web-site, uploaded to a website, transferred from one device or storage device to another etc. Each dancer model data structure 15 and therefore each dancer model is therefore independently portable.

A new dancer model can be created by a user by creating a new dancer model data structure 15 and storing it in the dancer model database 12. Also an existing dancer model may be varied by editing the existing dancer model data structure 15 for that dancer model and saving the new data structure 15 in the dancer model database 14. This functionality is schematically illustrated in Fig. 2 by block 42.

A method of generating a representation of a dancer dancing to music in the display 8 by the processor 2 is illustrated in Fig. 2.

The method requires the definition of a 'current' dancer model and a 'current' dance model. These current models are used to generate a representation of a dancer dancing to music. The current dance model defines the dancer's movement and the current dancer model defines the dancer's appearance.

The selection of the current dance model is schematically illustrated at block 50 in Fig. 2. The selection may be based upon current context information 51. The context information may be, for example, a user input command 53 that selects or specifies the current dance model. The selection may be alternatively automatic, that is, without user intervention. The context information may be, for example, metadata 52 provided with the music track or derived by processing the music track. This metadata may indicate the music genre, keywords from the lyrics, time signature etc. The automatic selection of the current dance model may be based on the metadata. The context information may be, for example, environmental information that is detected from radio or sound waves in the environment of the system 10. For example, the context environmental information may be metadata derived by processing ambient music detected via a microphone 54. This metadata may indicate the music genre, keywords from the lyrics detected using voice recognition, time signature etc. The automatic selection of the current dancer model may be based on the metadata. The context environmental information may be, for example, a call line identifier (CLI) received via a cellular radio transceiver 55 within a paging signal for an incoming telephone call. The automatic selection of the current dancer model may be based on the CLI.

The selection of the current dancer model is schematically illustrated at block 60 in Fig. 2. The selection may be based upon current context information 61. The context information may be, for example, a user input command 63 that selects or specifies the current dancer model. The selection may alternatively be automatic, that is, without user intervention. The context information may be, for example, metadata 62 provided with the music track or derived by processing the music track. This metadata may indicate the music genre, keywords from the lyrics etc. The automatic selection of the current dancer model may be based on the metadata. The context information may be, for example, environmental information that is detected from radio or sound waves in the environment of the system 10. For example, the context environmental information may be metadata derived by processing ambient music detected via a microphone 54. This metadata may indicate the music genre, keywords from the lyrics, time signature etc. The automatic selection of the current dancer model may be based on the metadata. The context environmental information may be, for example, a call line identifier (CLI) received via a cellular radio transceiver 55 within a paging signal for an incoming telephone call. The automatic selection of the current dancer model may be based on the CLI. The method 100 of generating a representation of a dancer dancing to music starts at step 20 in Fig. 2. The process then moves to step 22, where the tempo of the music track is obtained. The tempo is typically in the form of beats per minute. The music tempo may be provided with the music track as metadata, derived from the music or input by the user. Derivation of the music tempo is suitable when the music is produced from a stored music track and also when the music is ambient music produced by a third party.

The tempo information can be derived automatically using digital signal processing techniques. There are known solutions for extracting beat information from an acoustic signal, e.g.

Goto [Goto, M., Muraoka, Y. (1994). "A Beat Tracking System for Acoustic Signals of Music," Proceedings of ACM International Conference on Multimedia, San Francisco, CA₁ USA, p. 365-372.],

Klapuri [Klapuri, AP. , Eronen, A.J., Astola, J.T. (2006). "Analysis of the meter of acoustic musical signals," IEEE Transactions on Audio, Speech, and Language Processing 14(1), p. 342-355.]

Seppanen [Seppanen, J., Computational models of musical meter recognition, M. Sc. thesis, TUT 2001]

Scheirer [Scheirer, E. D. (1998). "Tempo and beat analysis of acoustic musical signals," Journal of the Acoustic Society of America 103(1), p. 588-601.].

The process then moves to step 24, where the current dance time is calculated. This represents the time into the dance clip defined by the current dance model data structure 13. DanceTime =

where the DanceTempo is DTo obtained from the current Dance Model the DanceLength is DL obtained from the current Dance Model the MusicTempo is the tempo obtained in step 22 and the MusicTime is the amount of time elapsed since the music track started i.e. the time from step 20.

The process next moves on to step 25, where the correct frame of the dance clip is determined i.e. the correct index / for the dance vector v(i) is determined using the Sample Rate SR obtained from the current dance model.

i=( DanceTime * SR) mod

The process then moves on to step 26, where the dance vector v(i) is obtained from the current dance model data structure 13. This current dance vector v(i) is then passed to step 27, where the dancer is visualized on the display 8.

The current dance vector v(i) defines the current pose of the dancer skeleton and the current dancer model provides appearance data that adds body and appearance to the posed skeleton. The visualization uses skinned mesh models, which are 3D- models comprised of polygons and textures about a skeleton model that can be easily animated.

Consequently, the method 100 generates, with at least each beat of the music, an image representing a dancer. The first data structure determines the pose of the represented dancer and the second data structure determines the appearance of the represented dancer. The dancer moves in sync to the musical beat.

The system 10 may also be used as a music player. In this embodiment, the music track may be stored in the memory 4. Computer program instructions when loaded into the processor 2, enable the functionality of a music player as is well known in the art. The music player processes the music track and produces an audio control signal which is provided to an audio output device to play the music. The music player is responsible for the audio playback, i.e., it reads the music songs and renders them to audio. The music player also provides the visualization software with information about transport controls (play, stop, pause, etc.) and the music clock i.e. the current song position. This maintains synchronisation between the rendered dancer and the rendered music.

The preceding explanation of an embodiment of the invention used dance vectors v(1), v(2), v(3) ... v(m) to represent poses of the dancer skeleton at different regularly spaced time intervals. In other embodiments, the time intervals between the dance vectors v(1), v(2), v(3) ... v(m) is not regular, but is variable. In this embodiment, the dance model data structure is modified. It no longer comprises a sample rate (SR) field as there is not a constant sample rate, but instead has a time vector T that has values that specify a dance time t(i) for each of the dance vectors v(i) in the set {v} and also optionally an end time t(m+1) that specifies the end time of the dance. The end time t(m÷1) when used, can replace the Dance Length (DL) field in the dance model data structure 13. At step 25 of the process illustrated in Fig. 2, where the correct frame of the dance clip is determined, the correct index / for the dance vector v(i) is determined by selecting the lowest value of / for which DanceTime≥ t(i) .

In embodiments of the invention, visualization of a dancer does not have to happen only for the sampled dancer positions i.e. at the position defined by the dance vectors. The dance vectors may be used to define keyframes and the movement of the dancer between the keyframes may be animated by interpolation. The animation is generated based on the dance vectors, but the in-between poses are generated on the fly. The following use cases further exemplify applications of embodiments of the invention:

Use case 1, bundled music download: A user downloads the latest Madonna single music track to her mobile phone. The user receives the song as mp3 file or similar and, because the user has a phone model that includes a dancer visualization application, she also receives a new dancer model and dance model that are well suited for the song. The dancer model defines the dancer appearance so that the dancer resembles Madonna in appearance in the music track's video and the dancer model defines the same dance moves as used in the video.

Use case 2, personalized dancer and user community: A user A downloads a new dancer model from a web service. She then modifies the dancer model using content creation tools available on the same web site. She also creates a new dance model for the dancer using the same tool. Next day in school she and her friends exchanges the new dance models they have created.

User case 3, personalized ringing tone animation: A dancer visualization is used in connection with ringing tones or musical alerts. The dancer is customized based on the caller ID. The mobile telephone typically has a contacts database with a plurality of entries. An entry may associate together information relating to a particular content. For example, there may be a name field and a telephone number field and other fields. The mobile telephone, when it receives a telephone number as a caller ID, is able to search the contacts database and retrieve a predetermined field or fields from the contact entry that contains the searched for telephone number. It may therefore be possible for the mobile telephone to be able to identify the sex of the caller from either a field that explicitly identifies a contact's sex or by inference from the contact's name. If the caller is a man, a male dancer may be shown and if the caller is a woman, then a female dancer may be shown. Also more detailed customization is possible, e.g., a customised dancer model may be used by mapping a photograph of the caller associated to a field within the caller's contact entry to the dancer model associated with the musical alert so that the dancer appears to have the caller's facial features and/or a customised dance model, again associated with the caller's contact entry, may be used so that the dancer for that caller has some special moves.

Use case 4, visualization of microphone input: A user walks into a restaurant that has a dance floor and some music playing in the background. He starts the dancer visualization application, which starts to analyze the background music through the device's microphone 54. The application detects 22 the beat from the background music and animates the dancer in sync with the music that is playing in the club.

Although embodiments of the present invention have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the invention as claimed.

Whilst endeavoring in the foregoing specification to draw attention to those features of the invention believed to be of particular importance it should be understood that the Applicant claims protection in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not particular emphasis has been placed thereon.

I/we claim:

Claims

1. A method of generating a representation of a dancer dancing to music having beats, the method comprising: storing a first data structure that defines movements of a dancer; storing a second data structure that defines an appearance of a dancer; and generating, with successive beats of the music, an image representing a dancer wherein the first data structure determines a pose of the represented dancer and the second data structure determines the appearance of the represented dancer.

2. A method as claimed in claim 1 , wherein the first data structure is selected from a plurality of first data structures each of which defines movements of a dancer;

3. A method as claimed in claim 2, wherein each first data structure has a standard format that enables the exchange of one first data structure with another first data structure to change the movements of the represented dancer.

4. A method as claimed in any preceding claim, wherein the first data structure is portable.

5. A method as claimed in any preceding claim, wherein the first data structure is editable by a user.

6. A method as claimed in any preceding claim, wherein the first data structure defines an ordered sequence of poses adopted by a dancer representation.

7. A method as claimed in any preceding claim, wherein the first data structure comprises data specifying, for each one a plurality of time instances, the position in space of each of a plurality of predetermined dancer body parts at the time instance.

8. A method as claimed in any preceding claim, wherein the second data structure is selected from a plurality of second data structures each of which defines an appearance of a dancer representation.

9. A method as claimed in claim 8, wherein each second data structure has a standard format that enables the exchange of one second data structure with another second data structure to change the appearance of the dancer representation.

10. A method as claimed in any preceding claim wherein the second data structure is portable.

11. A method as claimed in any preceding claim wherein the second data structure is portable independently of the first data structure.

12. A method as claimed in any preceding claim, wherein the second data structure is editable by a user.

13. A method as claimed in any preceding claim, wherein the second data structure is editable independently of the first data structure by a user.

14. A method as claimed in any preceding claim, wherein the music has a music tempo defined by its beats, the first data structure defines a plurality of poses adopted at first intervals by a dancer as the dancer dances with a dance tempo, and the generation of a new image representing a dancer occurs after each second time interval, wherein the second interval is the first interval scaled by the ratio of the music tempo to the dance tempo.

15. A method as claimed in any preceding claim, wherein the generated images are dependent upon detected environment information.

16. A method as claimed in any preceding claim, wherein the generated images are dependent upon a telephone caller identifier.

17. A method as claimed in any preceding claim, wherein the generated images are dependent upon ambient music.

18. A computer program for performing the method of any preceding claim.

19. A system for generating a representation of a dancer dancing to music having beats, the system comprising: a memory storing a first data structure that defines movements of a dancer and a second data structure that defines an appearance of a dancer; and 5 a generator for generating, with successive beats of the music, an image representing a dancer wherein the first data structure determines a pose of the represented dancer and the second data structure determines the appearance of the represented dancer.

I O 20. A mobile cellular telephone comprising the system as claimed in claim 19.

21. A mobile music player comprising the system as claimed in claim 19.

22. A computer program for generating a representation of a dancer dancing to 1 5 music having beats, comprising program instruction which when loaded into a processor provide: means for accessing data within a first data structure that defines movements of a dancer; means for accessing data within a second data structure that defines an appearance 20 of a dancer; and means for generating, with successive beats of the music, an image representing a dancer wherein the first data structure determines a pose of the represented dancer and the second data structure determines the appearance of the represented dancer.

25 23. A method of generating a representation of a dancer dancing to music having a music tempo comprising: storing data that defines a plurality of poses adopted at first intervals by a dancer as the dancer dances with a dance tempo; and generating, using the stored data, a new image representing a dancer after each 30 second time interval, wherein the second interval is first interval scaled by the ratio of the music tempo to the dance tempo

24. A computer program for generating a representation of a dancer dancing to music having a music tempo, comprising program instruction which when loaded into 35 a processor provide: means for accessing stored data that defines a plurality of poses adopted at first intervals by a dancer as the dancer dances with a dance tempo; and means for generating, using the stored data, a new image representing a dancer after each second time interval, wherein the second interval is first interval scaled by the ratio of the music tempo to the dance tempo

25. A method of generating a representation of a dancer dancing to music comprising: storing data that defines a plurality of poses adopted by a dancer as the dancer dances; and generating, using the stored data, a series of images representing a dancer wherein the generated images are dependent upon detected environment information.

26. A method as claimed in claim 25, wherein the detected environment information is a telephone caller identifier.

27. A method as claimed in claim 25, wherein the detected environment information is ambient music.

28. A computer program for generating a representation of a dancer dancing to music, comprising program instruction which when loaded into a processor provide: means for accessing stored data that defines a plurality of poses adopted by a dancer as the dancer dances; and meaqns for generating, using the stored data, a series of images representing a dancer wherein the generated images are dependent upon detected environment information.