WO2008132265A1

WO2008132265A1 - Modifying audiovisual output in a karaoke system based on performance context

Info

Publication number: WO2008132265A1
Application number: PCT/FI2007/000113
Authority: WO
Inventors: Juha Arrasvuori; Timo Kosonen; Arto Lehtiniemi; Antti Eronen
Original assignee: Nokia Corporation
Priority date: 2007-04-27
Filing date: 2007-04-27
Publication date: 2008-11-06
Also published as: CN101652808A

Abstract

The invention allows a performer and an audience to interact with a karaoke system in order to control or modify audio-visual output of a karaoke performance. Context information related to a current karaoke performance is obtained. In response, at least one audiovisual output aspect of the current karaoke performance is modified based on the obtained context information.

Description

TITLE OF THE INVENTION:

MODIFYING AUDIOVISUAL OUTPUT IN A KARAOKE SYSTEM BASED ON PERFORMANCE CONTEXT

BACKGROUND OF THE INVENTION:

Field of the Invention:

The invention generally relates to multimedia entertainment systems. In particular, the invention relates to karaoke systems .

Description of the Related Art:

Karaoke is a form of entertainment, originating in Japan, in which one or more singers, typically amateurs, sing along with recorded music on micro- phone. Often, the music is of a popular or well-known song with the voice of the original singer absent or reduced in volume. Lyrics are usually also displayed to the performer or performers, e.g. on a music video, to guide the sing-along. A conventional karaoke system contains at least one microphone, loudspeakers, an audio amplifier and/or a mixer optionally equipped with an effects unit for creating e.g. reverberation effects, a video display for the performer, as well as some kind of an audiovisual playback system, such as e.g. a video playback system or a local or remote computer-based system with video playback capability. The conventional karaoke system may also include multiple video screens for the audience and a video camera used for displaying the performer on the video screens.

Typically, in addition to displaying the lyrics, karaoke systems also have a visual method for indicating on the performer's video display (and sometimes also on the audience video screens) which word of the lyrics should be sung at a given moment, to further guide the sing-along. These visual methods include e.g. changing the colour of the lyrics in synchronization with the music being played back, and a ball that bounces on top of the current words in syn- chronization with the music being played back.

The music is typically recorded in a multi- track format comprising discrete soundtracks, e.g. each with its own instrument. The music works may be instrumental versions. Alternatively, the music works may also contain a soundtrack with lead vocals wherein the lead vocals soundtrack is muted or suppressed while the performer is singing. In other words, a karaoke song or music work is comprised of several soundtracks, most or all of which are instrumental background music soundtracks. In addition to the instrumental background music soundtracks, the karaoke song or music work may comprise a background vocals soundtrack as well as a lead vocals soundtrack. The instrumental background music soundtracks may be pre- recorded ones, such as e.g. digital audio soundtracks, or the instrumental background music soundtracks may be synthesized ones, such as e.g. MIDI (Musical Instrument Digital Interface) soundtracks.

However, prior art karaoke systems provide very limited - if any - options for the performer and the audience to interact with the karaoke system in order to control or modify audiovisual output of a karaoke performance. For example, while the performer may sing whatever he/she chooses, the original lyrics are still shown on the video display and screens. That is, prior art fails to teach or suggest adapting the displayed lyrics on a situation-by-situation basis. Similarly, prior art fails to teach or suggest adapting or modifying the instrumental background music soundtracks interactively based on context or circumstances around a current karaoke performance. SUMMARY OF THE INVENTION:

A first aspect of the present invention is a method in which context information related to a current karaoke performance is obtained. In response, at least one audiovisual output aspect of the current karaoke performance is modified based on the obtained context information.

A second aspect of the present invention is an apparatus which comprises a context information ob- tainer configured to obtain context information related to a current karaoke performance. The apparatus of the second aspect further comprises a karaoke output modifier configured to modify at least one audiovisual output aspect of the current karaoke perform- ance based on the obtained context information.

A third aspect of the present invention is an apparatus which comprises a context information obtaining means for obtaining context information related to a current karaoke performance. The apparatus of the third aspect further comprises a karaoke output modifying means for modifying at least one audiovisual output aspect of the current karaoke performance based on the obtained context information.

A fourth aspect of the present invention is a computer program embodied on a computer readable medium, the computer program controlling a data- processing device to perform: obtaining context information related to a current karaoke performance, and modifying at least one audiovisual output aspect of the current karaoke performance based on the obtained context information.

A fifth aspect of the present invention is a system which comprises at least one microphone config- ured to receive vocalization by at least one karaoke performer performing a current karaoke performance. The system of the fifth aspect further comprises a karaoke device configured to produce a karaoke signal comprising a video portion and an audio portion, and to mix the vocalization received by the microphone with the audio portion. The system of the fifth aspect further comprises at least one speaker configured to output the mixed audio portion, as well as a display configured to output the video portion. The system of the fifth aspect further comprises a context information obtainer configured to obtain context information related to the current karaoke performance. The system of the fifth aspect further comprises a karaoke output modifier configured to modify at least one of the video portion and the mixed audio portion based on the obtained context information. In an embodiment of the invention, the method of the first aspect is performed by a data-processing device controlled by a computer program embodied on a computer readable medium

In an embodiment of the invention, words be- ing vocalized by a karaoke performer performing the current karaoke performance are speech recognized.

In an embodiment of the invention, the context information is obtained by receiving indication of a predetermined key word included in the vocalized and speech recognized words.

In an embodiment of the invention, the context information is obtained by receiving indication of a predetermined lyrics portion speech recognized as having been vocalized with a replacing lyrics portion. In an embodiment of the invention, the context information is obtained as external input data.

In an embodiment of the invention, the context information is obtained as the external input data from an external device. In an embodiment of the invention, the context information is obtained as external performance sensor data. In an embodiment of the invention, the context information is obtained as personal information manager data associated with at least one participant of the current karaoke performance. In an embodiment of the invention, the at least one audiovisual output aspect is modified by adapting at least one soundtrack of the current karaoke performance based on the obtained context information. In an embodiment of the invention, the at least one audiovisual output aspect is modified by substituting at least one predetermined lyrics portion of the current karaoke performance when next displayed based on the obtained context information. It is to be understood that the aspects and embodiments of the invention described above may be used in any combination with each other. Several of the aspects and embodiments may be combined together to form a further embodiment of the invention. A method, an apparatus, a system or a computer program which is an aspect of the invention may comprise at least one of the embodiments of the invention described above.

The invention allows the performer and the audience to interact with the karaoke system in order to control or modify audiovisual output of a karaoke performance. For example, the invention allows modifying displayed lyrics on a situation-by-situation basis. As a further example, the invention allows modi- fying instrumental background music soundtracks interactively based on context or circumstances around a current karaoke performance.

BRIEF DESCRIPTION OF THE DRAWINGS: The accompanying drawings, which are included to provide a further understanding of the invention and constitute a part of this specification, illus- trate embodiments of the invention and together with the description help to explain the principles of the invention. In the drawings:

Fig. 1 is a block diagram illustrating a karaoke system according to an embodiment of the invention, and

Figs. 2a-2e are flow diagrams illustrating various methods according to embodiments of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS:

Reference will now be made in detail to the embodiments of the invention, examples of which are illustrated in the accompanying drawings. Figure 1 is a block diagram illustrating a karaoke system according to an embodiment of the invention.

The system illustrated in Figure 1 comprises a microphone 1540 that is configured to receive vo- calization by a karaoke performer performing a current karaoke performance. The system illustrated in Figure 1 further comprises a karaoke device 1500 that is connected to the microphone 1540.

The karaoke device 1500 may include an au- dio/video playback portion that is configured to play back a prerecorded karaoke song comprising multiple soundtracks, including at least instrumental background music soundtracks, and optionally a background vocals soundtrack and/or a lead vocals soundtrack. Op- tionally, the prerecorded karaoke song may comprise graphics, including lyrics of the currently performed song. The karaoke device 1500 may further include an amplifier portion that is configured to amplify the played back soundtracks before they are output to speakers 1521 and 1522. The karaoke device 1500 may further include a mixer portion that is configured to mix the performer's vocalization received from the mi- crophone 1540 with the audio portion. The prerecorded karaoke songs may be stored in the karaoke device 1500 or they may be stored elsewhere in the system and provided to the karaoke device 1500 for playback as needed.

It is to be understood that various components of the system illustrated in Figure 1 may be integrated with each other. For example, in an embodiment of the invention, the microphone 1540 may be in- tegrated with at least one of mobile devices 1610, 1620, 1630 to allow one or more participants to utilize their mobile device microphone (not shown in Figure 1) in connection with a karaoke performance rather than a dedicated microphone. The karaoke device 1500 may be a conventional karaoke machine, such as a prior art karaoke machine commonly used in karaoke bars and karaoke boxes. Alternatively, the karaoke device 1500 may be a game console provided with suitable software/hardware for performing the above described features. In another alternative, the karaoke device 1500 may be a personal computer provided with suitable software/hardware for performing the above described features . In yet another alternative, the karaoke device 1500 may be a mobile computing device, such as e.g. a smart phone, provided with suitable software/hardware for performing the above described features.

The system illustrated in Figure 1 further comprises the speakers 1521 and 1522, connected to the karaoke device 1500, that are configured to output the mixed audio signal received from the karaoke device 1500. Naturally, the amount of speakers is not limited to two. Rather, there may be any number of speakers. The system illustrated in Figure 1 further comprises a display 1510, such as a video monitor or a screen, that is configured to display the lyrics of the currently performed song in order to guide the performer to sing-along. In addition to the display 1510 intended for the performer, there may be additional displays (not shown in Figure 1) for the audience. The system illustrated in Figure 1 further comprises a video camera 1530 which may be used to record the current performer which recording may be e.g. displayed live on the audience displays while the performer is performing. It is to be understood that the video camera 1530 is optional and not required for the present invention. The system may also comprise several video cameras, each configured to display the currently active karaoke participants .

In an embodiment of the invention, the display 1510 or at least one of the additional displays may be integrated with at least one of the mobile devices 1610, 1620, 1630, respectively, to allow one or more participants to utilize their mobile device display (not shown in Figure 1) in connection with a karaoke performance rather than a dedicated display. Similarly, in an embodiment of the invention, the video camera 1530 may be integrated with at least one of the mobile devices 1610, 1620, 1630 to allow one or more participants to utilize their mobile device video camera (not shown in Figure 1) in connec- tion with a karaoke performance rather than a dedicated video camera.

The system illustrated in Figure 1 further comprises a personal computer 1000. In the embodiment of the invention illustrated in Figure 1, the personal computer 1000 comprises a number of features described in more detail below that constitute a portion of the invention. However, it is to be understood that instead of the personal computer 1000 another suitable device may be used. The personal computer 1000 may be replaced with e.g. a game console or a mobile computing device. In yet another embodiment, the personal computer 1000 and the karaoke device 1500 may be integrated into a single device.

In the embodiment of the invention illustrated in Figure 1, the personal computer 1000 com- prises an apparatus 1100 according to the invention which comprises a context information obtainer 1110, a karaoke output modifier 1120, as well as a speech recognizer 1130 that is configured to speech recognize words being vocalized via the microphone 1540 by the karaoke performer performing the current karaoke performance .

The context information obtainer 1110 is configured to obtain context information related to a current karaoke performance. In one embodiment of the invention, the context information obtainer 1110 is configured to obtain the context information by receiving indication from the speech recognizer 1130 that a predetermined key word is included in the vocalized and speech recognized words. In another em- bodiment of the invention, the context information obtainer 1110 is configured to obtain the context information by receiving indication from the speech recognizer 1130 that a predetermined lyrics portion was speech recognized as having been vocalized with a re- placing lyrics portion. In yet another embodiment of the invention, the context information obtainer 1110 is configured to obtain the context information as external input data.

In an embodiment, this external input data may be received e.g. via a keyboard, mouse or another input device (not shown in Figure 1) associated with the personal computer 1000. In another embodiment, this external input data may be received e.g. via a communications network (not shown in Figure 1) , such as the Internet or a local area network connected to the personal computer 1000. In yet another embodiment, this external input data may be received e.g. from an external device, such as a sensor device 1300, a mobile device 1610 associated with the performer or at least one of mobile devices 1620 and 1630 associated with audience participants . The external input data received from the mobile devices 1610, 1620, and/or 1630 may include e.g. personal information manager data associated with at least one participant (performer or audience) of the current karaoke performance. The external input data may also include infor- mation on media files on the device of at least one participant . The external input data may also include the audio data input (sung or spoken) by at least one of the participants to the microphone of his/her mobile phone. The external input data received from the sensor device 1300 may be e.g. external performance sensor data. It is to be understood that the term "external" is herein used to refer to "external to a conventional karaoke system" . These various options for obtaining the context information will be described in more detail with reference to Figures 2a-2e.

In an embodiment of the invention, the personal information manager data associated with the at least one participant may comprise at least one of calendar data, contacts list data, and presence ser- vice data. As is known in the art, the term "presence service" refers to an information service that maintains status information related to a user's availability for communication. A given user's availability status for communication may be distributed or pub- lished for other users via the presence service. The user may update his/her availability status as desired. It is to be understood that herein the term "presence service" also encompasses "Extended Presence", disclosed e.g. in protocol suites XEP-0119 and XEP-0163, 8/2006, by XMPP Standards Foundation (http://www.xmpp.org/xsf/) . The karaoke output modifier 1120 is configured to modify at least one audiovisual output aspect of the current karaoke performance based on the obtained context information. In an embodiment of the invention, the karaoke output modifier 1120 may comprise a soundtrack adapter 1121 that is configured to adapt at least one soundtrack of the current karaoke performance based on the obtained context information. It is to be understood that in this context the term "soundtrack" includes both prerecorded tracks, such as music tracks, and vocals sung during a karaoke performance by the actual karaoke performer as well as backing vocals sung by audience members .

Furthermore, the karaoke output modifier 1120 may further comprise a lyrics adapter 1122 that is configured to substitute at least one predetermined lyrics portion of the current karaoke performance when next displayed based on the obtained context information. Again, these various options for modify the audiovisual output aspects of the current karaoke performance will be described in more detail with reference to Figures 2a-2e.

In the embodiment of the invention illustrated in Figure 1, the personal computer 1000 further comprises a soundtrack storage 1210, a lyrics storage 1220, a rules storage 1230, a voice command storage 1240, and a modifiable words storage 1250. It is to be understood that these storages need not be arranged within the personal computer 1000. Rather, at least one of the storages 1210-1250 may be provided in another suitable location in connection with the karaoke system. Furthermore, at least two of these storages may be integrated with each other.

Figure 2a is a flow diagram illustrating a method according to an embodiment of the invention. At step 200, context information related to a current karaoke performance is obtained. Then, at step 201, at least one audiovisual output aspect of the current karaoke performance is modified based on the obtained context information.

The embodiment of the invention illustrated in Figure 2a may be implemented in several ways . Figures 2b-2e illustrate examples of such implementations. In the embodiment illustrated in Figure 2b, words being vocalized by a karaoke performer performing the current karaoke performance are speech recog- nized, step 210. At step 211, it is determined whether one or more predetermined key words are included in the words vocalized and speech recognized so far. If not, the method returns to step 210 wherein speech recognition is continued. If yes, at least one sound- track of the current karaoke performance is adapted based on the recognized key word, step 212.

As is known in the art, the term "speech recognition" refers to the art of recognizing, by means of a computer program, words or phrases spoken or sung by a user. Related to the art of speech recognition is the art of voice control which refers to carrying out, with a computer system, commands associated with the recognized words. Various forms of speech recognition systems useful e.g. for recognizing isolated words are described in Rabiner, Juang, "Fundamentals of Speech Recognition", Prentice-Hall, 1993. To make the speech recognition system more robust for input in the form of singing, the acoustic models commonly used in speech recognition systems may be trained using sing- ing. Alternatively or additionally, adaptation with singing data for the acoustic models trained with speech could be performed using e.g. Maximum Likelihood Linear Regression method, as described in Hosoya, Suzuki, Ito, Makino, "Lyrics recognition from a sing- ing voice based on finite state automaton for music information retrieval", in Proceedings of the 6th International Conference on Music Information Retrieval, London, UK 11 - 15 September 2005. Moreover, several constraints can be used to improve the performance of recognition, such as knowledge of the lyrics of the song which may be used as the recognition grammar, knowledge of the current song position to indicate the current word, and limited vocabularies for the possible key words and/or word alternatives .

As described above, a karaoke song or music work may comprise multiple soundtracks, including in- strumental background music soundtracks as well as an optional background vocals soundtrack and/or an optional lead vocals soundtrack. Typically however, the music works are instrumental versions or have the lead vocals removed. The music work may be e.g. in a MIDI (Musical Instrument Digital Interface) or multitrack digital audio format. That is, the music work may consist of discrete tracks each with its own instrument.

In accordance with the invention, the playback of the music work can be modified at step 212 (or at step 231 of Figure 2d) by the soundtrack adapter 1121 at least in the following ways: muting or un- muting a track and therefore an instrument; changing the arrangement from one genre (e.g. grunge) to another (e.g. jazz); changing the playback from one song part to another (to facilitate this, the start positions of song parts such as intro, verse, chorus, and coda may be annotated e.g. in the metadata section of the digital audio files described in more detail below) ; adding or removing an effect such as distortion or echo; increasing or decreasing the level of an effect such as distortion or echo; increasing or decreasing the playback speed or tempo; and increasing or decreasing the volume of the vocals compared to the background song. Further in accordance with the invention, modification of a music work to be used in the karaoke performance may comprise selecting a suitable accompa- niment track according to the context ^' information. For example, there may be several versions of each musical work, each played with different musical style. When contextual information, such as performer profession, is obtained, an accompaniment track of an associated style may be selected. For example, a synth pop style accompaniment track may be selected for a performer who is an engineer by profession.

Further in accordance with the invention, a set of predefined voice commands, such as e.g. "more drums", "less brass", "take me to the chorus", "take it faster", and "one more time boys" may be provided, for example in the voice command storage 1240. The voice commands may be shown e.g. on the display 1510 so as to allow the performer to learn them.

During a karaoke performance, the apparatus 1100 of the invention observes - via speech recognition performed (step 210) by the speech recognizer 1130 - the words or phrases sung or spoken by the per- former. When the context information obtainer 1110 detects (step 211) that the performer has given a voice command included in the above set of predefined voice commands, the soundtrack adapter 1121 executes a predetermined action associated with the detected voice command. For example, when the user exclaims "more drums", the soundtrack adapter 1121 may unmute a track with additional drumming. As another example, when the performer shouts "take me to the chorus", the playback may jump to the beginning of the chorus part of the current music work. Rules for modifying the playback may be provided, for example in the rules storage 1230 which may be e.g. a data file arranged in the hard drive (not shown in Figure 1) of the personal computer 1000. These rules may define e.g. that the voice com- mand "more drums" is executed by activating a certain track of the respective MIDI or multitrack digital audio file. Additionally, the apparatus 1100 may also have capability to support the singer e.g. in case the singer forgets the melody of the song. The predefined voice command for this functionality may be e.g. "help me" . When this command is activated, the soundtrack adapter 1121 may e.g. play the melody as an instrumental or vocal version.

Furthermore, in case there are similar words in the voice commands as in the music work's lyrics, the apparatus 1100 may compare the words included in the lyrics to the recognized words to facilitate avoiding unwanted modifications to the playback.

Optionally, a dedicated button or other such means may be provided which the user activates when issuing a voice command. This way the user can indicate explicitly to the apparatus 1100 that the words uttered while the button is pressed represent a voice command instead of normal singing. This is likely to improve the performance as the apparatus 1100 does not have to continuously record the sound input and compare it to the key word vocabulary but rather receives an explicit indication of a key word being uttered. Similarly, this kind of button or other such means could optionally be used to indicate when the user wishes to modify a word in the lyrics. Thus, the user could activate the button before singing the replacing lyrics portion.

In the embodiment illustrated in Figure 2c, words being vocalized by a karaoke performer perform- ing the current karaoke performance are speech recognized, step 220. At step 221, it is determined whether a predetermined lyrics portion is speech recognized as having been vocalized with a replacing lyrics portion. If not, the method returns to step 220 wherein speech recognition is continued. If yes, at least one predetermined lyrics portion of the current karaoke per- formance is substituted with the vocalized replacing lyrics portion when next displayed, step 222.

The embodiment of the invention illustrated in Figure 2c allows the karaoke performer to change - while still simultaneously performing the current song - at least parts of the original lyrics shown on the display 1510. In an embodiment, predetermined categories of words may be changed by the performer. For example, the lyrics of many songs contain names of peo- pie and places. The karaoke performer may e.g. change the name of a person or a place that is mentioned in the music work.

This may be implemented by allowing certain categories of words, e.g. the names of people and places, to be replaced with other words. This may be implemented so that the apparatus 1100 of the invention recognizes the words sung or spoken by the performer via speech recognition performed (step 220) by the speech recognizer 1130, compares these words to the original lyrics (step 221) , and if some words (of those that are allowed to be changed) are different, they are substituted in the lyrics that are shown on the screen whenever that word appears again during the karaoke performance (step 222). The apparatus 1100 may be configured to present the words that are allowed to be replaced e.g. with a special text colour, on the display 1510.

Optionally, the system may provide a predetermined number of alternatives for each word that can be substituted. This can be implemented to limit the amount of alternative words to be speech recognized and increase the robustness of the speech recognition e.g. in a noisy environment. In this case, the system is arranged to recognize the vocalized word from the limited set of possible alternative words. The alternatives for words can be obtained using the collected context information: for example the names from the Contacts list of the performer may be provided as alternatives for person names in the lyrics, names of places visited by the performer may be provided as alternatives for names of places, and so on. The alter- natives may be shown to the performer who may then vocalize one of the alternatives, either the original or one of the changed ones . Next time the word occurs in the lyrics it is replaced with the version vocalized by the user. Lyrics for the various songs may be provided e.g. in the lyrics storage 1220 which may be e.g. a text file arranged in the hard drive (not shown in Figure 1) of the personal computer 1000. In the lyrics storage 1220, the names of people and places may be tagged in a suitable way, for example. There may be a tag for each category of words that are allowed to be replaced. In addition, the modifiable words storage 1250 (e.g. a suitable data base) may be provided which contains words that are allowed replacements . These words may be organized e.g. on the basis of the above tags .

Optionally, the apparatus 1100 may be configured to recognize when a masculine name is replaced in vocalization with a feminine name (or vice versa) , and to consequently substitute "he" with "she" (or vice versa) in the displayed lyrics.

In the embodiment illustrated in Figure 2d, external input data is fetched or received for use as the performance context information, step 230. In re- sponse, at least one soundtrack of the current karaoke performance is adapted based on the fetched or received external input data, step 231.

In other words, the apparatus 1100 of the invention detects using the context information obtainer 1110 the context (i.e. the circumstances and conditions) surrounding the karaoke performance situation (step 230) , and adapts using the soundtrack adapter 1121 the background music soundtracks on the basis of the properties of the detected performance context (step 231) .

As described above, this context information may be received as external input data e.g. via a keyboard, a mouse or another input device (not shown in Figure 1) associated with the personal computer 1000; or via a communications network (not shown in Figure 1) , such as the Internet or a local area network con- neeted to the personal computer 1000; or e.g. from an external device, such as the sensor device 1300, the mobile device 1610 associated with the performer, or at least one of the mobile devices 1620 and 1630 associated with audience participants. Sensor devices 1300 that can be used include e.g. a microphone, a video camera, a positioning device (e.g. a Global Positioning System device) , as well as illumination, humidity, temperature, blood pressure and heart rate meters . These sensor devices provide data to the context information obtainer 1110 that can then determine properties of the context of the karaoke performance. The context information obtainer 1110 may also access personal information manager data (including presence service data, calendar data and contacts list data, as described above) on the mobile device 1610 of the karaoke performer and/or the mobile devices 1620-1630 of the audience participants via e.g. Bluetooth, and use this data to determine the performance context. The context information obtainer 1110 may also access other personal data, such as a location where some of the participants ' pictures have been taken, to determine e.g. the latest travel destination. The context information obtainer 1110 may also access information on the media files on the performer's and participants' mobile devices 1610- 1630, such as their favourite songs, artists, and/or music genres. The context information obtainer 1110 may also obtain context information from an external server, e.g. by querying with the GPS location. As an example, the external server might search for musical artists and bands who have lived close to the current location (e.g. in the same city) , or who have some other relation to the particular location, and provide information on the styles and genre of their music to be used when adapting the musical performance.

Properties of the performance context may in- elude, for example, the time of day/month/year, the people who participate in the karaoke performance or who are in the audience, the number of people, the physiological state of the karaoke performers, the background of the people (e.g. their gender, what their day was like, what people they know, etc.), and a location where they have been lately travelling. This information may then be used to modify the properties of the music.

The sensors may be attached to the apparatus 1100 or to remote devices (such as the mobile devices 1610-1630) that transfer the data to the apparatus 1100. For example, some people may have heart rate meters in their wristwatch.es and the apparatus 1100 may access the heart rate data directly from the watch or via a mobile device.

A set of rules may be provided, e.g. in the rules storage 1230, that define how the music is adapted on the basis of the properties of the context. In general, the playback of the background music may be modified in ways similar to those described above in connection with step 212 of Figure 2b.

More particularly, the playback of the background music may be adapted on the basis of the properties of the detected performance context for example in the following ways:

played with a slower tempo to calm him and the audience down, or alternatively with faster tempo to better reflect his or the audience's feelings .

Favourite or most freThe music arrangement may quently played artbe modified to resemble ist/album/song/music genre the favourite songs or of the performer or the the favourite music participants . genre. For example, a solo portion in the arrangement may be replaced with one from the favourite song of the performer, or the arrangement may follow the style common for the favourite genre of the performer or most of the participants.

Alternatively, the favourite music tracks of the user or the audience may be suggested to be used for the karaoke performance .

Alternatively, the system may suggest the karaoke performance to consist of a potpourri of the favourite songs of the performer: in this case the system plays parts of the favourite songs of the

Optionally, the system may request more than one karaoke participants (members of the audience) to participate in the karaoke performance. For example, one or more audience members may participate by singing the chorus parts of songs . The audience members may sing the chorus part to the microphone of their mobile terminals which is then mixed to the accompaniment and singing of the karaoke performer. With the same principle, audience members may also provide the backing vocals while the main karaoke performer is singing the lead melody. The backing vocals may be processed with effects like pitch shift in order to create a complex choir sound. Furthermore, vocals of each background singer may be processed differently. For example, female backing vocals may be shifted down by an octave in pitch to make them sound more masculine.

Related to the last example in the table above, the system may construct the karaoke play list from the favorite songs of the karaoke participants . The play list may consist of whole songs, or alternatively the system may play potpourris of complete songs, such that it concatenates parts (e.g. choruses or verses) of the favorite songs of the participants in a seamless fashion, and transitions between songs occur without breaks . Each participant may furthermore be requested to sing in turn his favorite song. In this case, the karaoke performer is changed during the performance. When the favorite song of participant X starts to play, the karaoke system may activate the microphone of the mobile device of participant X, start mixing the audio input from participant X's microphone to the music accompaniment, and mute the microphone of the previous participant. Optionally, the system may send an indication to participant's X's mo- bile terminal to let him know that it is his turn to sing. This indication may be e.g. a visual indication, such as text on the screen of participant X's mobile terminal, or e.g. a vibration alarm.

Optionally, the system may also show a text in the main karaoke screen, such as "Mr X, now it's your favorite song playing, please start singing" . Before participant X notices that it is his turn to sing, the system may loop the first few measures of the current song, until the requested participant sings the first few phrases to the microphone.

Optionally, the karaoke device may give points to the participants based on how alert they were: participants may receive scores depending on the speed they started singing, especially in the case when the karaoke play list is made as a potpourri of the favorite songs of the karaoke participants. The best scores and/or response times may be shown on the video screen to reward the most alert (fastest responding) participants. Optionally, the system may show video picture of the currently active participant on the main screen of the karaoke performance. When the karaoke song and performer is changed without breaks, the video stream from the current performer's mobile device is used. When several participants are singing the chorus part, the system may show the video streams from the mobile devices of all the singers singing the chorus. Moreover, the system may also show a video where the main video on the background shows the original music performers performing the music track, but where parts of the original music video are re- placed with video stream from participants' mobile devices. For example, the head of the main singer in the original music video may be replaced with the video stream showing the karaoke performer's head. When several participants are singing the chorus part, the heads of the chorus singers may be replaced with the video streams showing the faces of the karaoke participants currently singing the chorus. Alternatively, still pictures representing the karaoke participants may be portrayed on top of the heads of the performers on the original karaoke video.

In the embodiment illustrated in Figure 2e, external input data is fetched or , received for use as the performance context information, step 240. In response, at least one predetermined lyrics portion of the current karaoke performance is substituted when next displayed based on the fetched or received external input data, step 241.

In other words, the apparatus 1100 of the invention detects using the context information obtainer 1110 the context (i.e. the circumstances and conditions) surrounding the karaoke performance situation (step 240) , and adapts using the lyrics adapter 1122 the lyrics on the basis of the properties of the detected performance context (step 241) . Again, various devices described above in connection with step 230 of Figure 2d may be used to obtain the performance context in step 240 of Figure 2e. Also, the context information itself may be similar to that used in connection with the embodiment of Figure 2d. This time, however, it is utilized to adapt or modify the displayed lyrics. Again, there may be provided categories of words that are allowed to be changed in the lyrics of a music work. Such categories may include e.g. names of people and places .

The lyrics may be adapted on the basis of the properties of the detected context for example in the following ways :

Performer's previous The person's previous karakaraoke history oke history may affect the result in new karaoke songs: e.g. if the user has sang the same song many times previously it may have more radical changes compared to a new song.

Performer's favourite When the system knows the music, or other media favourite music or other content media content (e.g. poem, book, movie) of the performer, some of the lyrics in the karaoke song may be changed with a phrase from the favourite media works .

Optionally, the audience may affect the displayed lyrics by sending words from the mobile devices 1620-1630 e.g. via Bluetooth or Short Message Service (SMS) to the apparatus 1100. For example, the audience may affect the content of the lyrics in the karaoke performance in the following way: certain words in the karaoke lyrics are tagged such that they can be changed. This tagging may have been performed e.g. in advance at some point when the karaoke content was made. Assuming now that e.g. some or all nouns in the lyrics have been tagged to allow substitutions, the audience may then participate the karaoke performance by sending, e.g. via their mobile devices 1620-1630, words that they would like to include in the karaoke performance. For example in this case the people send nouns to the apparatus 1100. The apparatus 1100 maintains a list of the nouns received from the audience, and as the karaoke performance progresses, replaces a noun in the lyrics with a noun provided by the audience. Optionally, the apparatus 1100 may go through the list of nouns provided by the audience and select such new words that will rhyme. Naturally, other word categories may be replaced as well, such as verbs and proper nouns. There may be an option to allow the audience to select which word category (such as noun, verb, proper noun) the word belongs to by selection from a menu. The corre- sponding word categories may have been tagged in the lyrics as well. In this case, the apparatus 1100 will replace a word of the lyrics in a given category with a word provided by the audience in the same category. For example, every proper noun (such as person names) is replaced with a person name provided by the audience. In this case, the names may also be taken from the contact list applications of the participants, such that e.g. every person name in the lyrics is replaced with the name of some participant in the kara- oke audience.

The apparatus 1100 may also be configured to automatically recognize the word categories from the lyrics and the words that people send. In this case, the apparatus 1100 may automatically pick words from the correct word category to replace words in the lyrics .

Furthermore, when multiple suggestions for word replacements are received, the apparatus 1100 may choose the word randomly, let the audience vote for the best solution, or choose a best matching word for the context. The event organizer, for example, may decide which of these is the preferred method for word selection.

In addition, the audience may receive infor- mation from the apparatus 1100 about the words that can be changed, or about the categories of the words that can be changed. The apparatus 1100 may deliver this information e.g. via Short Message Service, Bluetooth, or via a web browser. For example, the apparatus 1100 may return a list of words that can be changed, or the whole lyrics that show changeable lyr- ics with special colors. As yet another example, the apparatus 1100 may send the audience information that the audience should now send nouns or verbs to the apparatus 1100.

If the audience knows the lyrics of the song, or the apparatus 1100 sends this information to the audience, the audience may control which words are replaced by which. For example, a member of the audience might send a command "replace cat -> beer" to explicitly force the word cat to be replaced with the word beer. Alternatively, the members of the audience could e.g. take a look at a web page provided by the apparatus 1100 with the lyrics written down and text boxes in the place of words that can be changed, write the alternative words to the boxes and then submit the in- formation as a web form to the apparatus 1100.

Furthermore, the audience may be allowed to affect the balance of the musical performance in addition to being able to adjust individual parameters from different instrument tracks, such as echo, dis- tortion, delay, chorus etc.

The actual adjusting can be implemented so that an audience member is able to list the available parameters that can be modified e.g. by utilizing a Bluetooth connection to the host computer. When a pa- rameter is selected for modifying, a suitable user interface or UI is transferred to the user's mobile device (or computer) . The UI may contain e.g. sliders for controlling the amount of distortion for an instrument track. The modifications are applied right away so that the user can hear the result of his actions immediately. Changing the balance of the musical performance works in a similar manner: a UI is sent to the user's mobile device. The UI includes volume sliders for each selected track of music. In addition, an audience member could affect the balance of the musical performance by sending SMS messages with simple instructions like "more bass" or "less echo" .

Furthermore, the system may optionally change the audio signal that comes from the user microphone and goes to the speakers. For example, when the user sings "Bill", the system may modify the audio signal so that the audience can hear "Jill" from the speakers. The audience may specify which words are changed through their devices that are connected to the system, in the same way as specifying the lyrics that are displayed. The system may synthesize singing of the audience defined word and mix it into the output instead of the singer output, or it may filter the singer output in order to change certain phonemes to make the singing sound like the words specified by the audience. The above described lead vocals and backing vocals modifications may be performed in response to requests received from the audience members.

The exemplary embodiments can include, for example, any suitable servers, workstations, game con- soles, personal computers, karaoke devices, mobile devices, and the like, capable of performing the processes of the exemplary embodiments. The devices and subsystems of the exemplary embodiments can communicate with each other using any suitable protocol and can be implemented using one or more programmed computer systems or devices.

One or more interface mechanisms can be used with the exemplary embodiments, including, for example, Internet access, telecommunications in any suit- able form (e.g., voice, modem, and the like), wireless communications media, and the like. For example, employed communications networks or links can include one or more wireless communications networks, cellular communications networks, 3G communications networks, Public Switched Telephone Network (PSTNs) , Packet Data Networks (PDNs) , the Internet, intranets, a combina- tion thereof, and the like.

It is to be understood that the exemplary embodiments are for exemplary purposes, as many variations of the specific hardware used to implement the exemplary embodiments are possible, as will be appre- ciated by those skilled in the hardware and/or software art(s). For example, the functionality of one or more of the components of the exemplary embodiments can be implemented via one or more hardware and/or software devices . The exemplary embodiments can store information relating to various processes described herein. This information can be stored in one or more memories, such as a hard disk, optical disk, magneto- optical disk, RAM, and the like. One or more databases can store the information used to implement the exemplary embodiments of the present inventions. The databases can be organized using data structures (e.g., records, tables, arrays, fields, graphs, trees, lists, and the like) included in one or more memories or storage devices listed herein. The processes described with respect to the exemplary embodiments can include appropriate data structures for storing data collected and/or generated by the processes of the devices and subsystems of the exemplary embodiments in one or more databases.

All or a portion of the exemplary embodiments can be conveniently implemented using one or more general purpose processors, microprocessors, digital signal processors, micro-controllers, and the like, pro- grammed according to the teachings of the exemplary embodiments of the present inventions, as will be appreciated by those skilled in the computer and/or software art(s). Appropriate software can be readily- prepared by programmers of ordinary skill based on the teachings of the exemplary embodiments, as will be appreciated by those skilled in the software art. Fur- ther, the exemplary embodiments can be implemented on the World Wide Web. In addition, the exemplary embodiments can be implemented by the preparation of application-specific integrated circuits or by interconnecting an appropriate network of conventional compo- nent circuits, as will be appreciated by those skilled in the electrical art(s). Thus, the exemplary embodiments are not limited to any specific combination of hardware and/or software.

Stored on any one or on a combination of com- puter readable media, the exemplary embodiments of the present inventions can include software for controlling the components of the exemplary embodiments, for driving the components of the exemplary embodiments, for enabling the components of the exemplary embodi- ments to interact with a human user, and the like. Such software can include, but is not limited to, device drivers, firmware, operating systems, development tools, applications software, and the like. Such computer readable media further can include the computer program product of an embodiment of the present inventions for performing all or a portion (if processing is distributed) of the processing performed in implementing the inventions . Computer code devices of the exemplary embodiments of the present inventions can include any suitable interpretable or executable code mechanism, including but not limited to scripts, interpretable programs, dynamic link libraries (DLLs) , Java classes and applets, complete executable programs, Common Object Request Broker Architecture (CORBA) objects, and the like. Moreover, parts of the processing of the exemplary embodiments of the present inventions can be distributed for better performance, reliability, cost, and the like.

As stated above, the components of the exemplary embodiments can include computer readable medium or memories for holding instructions programmed according to the teachings of the present inventions and for holding data structures, tables, records, and/or other data described herein. Computer readable medium can include any suitable medium that participates in providing instructions to a processor for execution. Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, transmission media, and the like. Non-volatile media can include, for example, optical or magnetic disks, mag- neto-optical disks, and the like. Volatile media can include dynamic memories, and the like. Transmission media can include coaxial cables, copper wire, fiber optics, and the like. Transmission media also can take the form of acoustic, optical, electromagnetic waves, and the like, such as those generated during radio frequency (RF) communications, infrared (IR) data communications, and the like. Common forms of computer- readable media can include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other suitable magnetic medium, a CD-ROM, CDR, CD-RW, DVD, DVD-ROM, DVD±RW, DVD+R, any other suitable optical medium, punch cards, paper tape, optical mark sheets, any other suitable physical medium with patterns of holes or other optically recognizable indi- cia, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other suitable memory chip or cartridge, a carrier wave or any other suitable medium from which a computer can read.

While the present inventions have been de- scribed in connection with a number of exemplary embodiments, and implementations, the present inventions are not so limited, but rather cover various modifica- tions, and equivalent arrangements, which fall within the purview of prospective claims .

Claims

WHAT IS CLAIMED IS:

1. A method, comprising: obtaining context information related to a current karaoke performance, and modifying at least one audiovisual output aspect of the current karaoke performance based on the obtained context information.

2. The method according to claim 1 , further comprising: speech recognizing words being vocalized by a karaoke performer performing the current karaoke performance .

3. The method according to claim 2 , wherein the obtaining of the context information further com- prises obtaining the context information by receiving indication of a predetermined key word included in the vocalized and speech recognized words.

4. The method according to claim 2 , wherein the obtaining of the context information further com- prises obtaining the context information by receiving indication of a predetermined lyrics portion speech recognized as having been vocalized with a replacing lyrics portion.

5. The method according to any of the claims 1 - 4, wherein the obtaining of the context information further comprises obtaining the context information as external input data .

6. The method according to claim 5 , wherein the obtaining of the context information as the exter- nal input data further comprises obtaining the context information as the external input data from an external device.

7. The method according to claim 6, wherein the obtaining of the context information as the exter- nal input data from the external device further comprises obtaining the context information as external performance sensor data.

8. The method according to claim 6 or 7 , wherein the obtaining of the context information as the external input data from the external device further comprises obtaining the context information as personal information manager data associated with at least one participant of the current karaoke performance.

9. The method according to any of the claims 1 - 8, wherein the modifying of the at least one audiovisual output aspect further comprises adapting at least one soundtrack of the current- karaoke performance based on the obtained context information.

10. The method according to any of the claims 1 - 9, wherein the modifying of the at least one audiovisual output aspect further comprises substituting at least one predetermined lyrics portion of the current karaoke performance when next displayed based on the obtained context information.

11. The method according to any of the claims 1 - 10, wherein the method is performed by a data- processing device controlled by a computer program embodied on a computer readable medium.

12. An apparatus, comprising: a context information obtainer configured to obtain context information related to a current karaoke performance, and a karaoke output modifier configured to modify at least one audiovisual output aspect of the current karaoke performance based on the obtained context information.

13. The apparatus according to claim 12, further comprising: a speech recognizer configured to speech recognize words being vocalized by a karaoke performer performing the current karaoke performance.

14. The apparatus according to claim 13, wherein the context information obtainer is further configured to obtain the context information by receiving indication of a predetermined key word included in the vocalized and speech recognized words.

15. The apparatus according to claim 13 , wherein the context information obtainer is further configured to obtain the context information by receiving indication of a predetermined lyrics portion speech recognized as having been vocalized with a replacing lyrics portion.

16. The apparatus according to any of the claims 12 - 15, wherein the context information obtainer is further configured to obtain the context information as external input data.

17. The apparatus according to claim 16, wherein the context information obtainer is further configured to obtain the context information as the external input data from an external device.

18. The apparatus according to claim 17, wherein the context information obtainer is further configured to obtain the context information as external performance sensor data from an external performance sensor device.

19. The apparatus according to claim 17 or 18, wherein the context information obtainer is fur- ther configured to obtain the context information as personal information manager data associated with at least one participant of the current karaoke performance.

20. The apparatus according to any of the claims 12 - 19, wherein the karaoke output modifier further comprises a soundtrack adapter configured to adapt at least one soundtrack of the current karaoke performance based on the obtained context information.

21. The apparatus according to any of the claims 12 - 20, wherein the karaoke output modifier further comprises a lyrics adapter configured to substitute at least one predetermined lyrics portion of the current karaoke performance when next displayed based on the obtained context information.

22. An apparatus, comprising: a context information obtaining means for ob- taining context information related to a current karaoke performance, and a karaoke output modifying means for modifying at least one audiovisual output aspect of the current karaoke performance based on the obtained context information.

23. The apparatus according to claim 22, further comprising: a speech recognizing means for speech recognizing words being vocalized by a karaoke performer performing the current karaoke performance.

24. The apparatus according to claim 23, wherein the context information obtaining means is further adapted for obtaining the context information by receiving indication of a predetermined key word included in the vocalized and speech recognized words.

25. The apparatus according to claim 23, wherein the context information obtaining means is further adapted for obtaining the context information by receiving indication of a predetermined lyrics por- tion speech recognized as having been vocalized with a replacing lyrics portion.

26. The apparatus according to any of the claims 22 - 25, wherein the context information obtaining means is further adapted for obtaining the context information as external input data.

27. The apparatus according to claim 26, wherein the context information obtaining means is further adapted for obtaining the context information as the external input data from an external device.

28. The apparatus according to claim 27, wherein the context information obtaining means is further adapted for obtaining the context information as external performance sensor data from an external performance sensor device.

29. The apparatus according to claim 27 or 28, wherein the context information obtaining means is further adapted for obtaining the context information as personal information manager data associated with at least one participant of the current karaoke performance .

30. The apparatus according to any of the claims 22 - 29, wherein the karaoke output modifying means further comprises a soundtrack adapting means for adapting at least one soundtrack of the current karaoke performance based on the obtained context information.

31. The apparatus according to any of the claims 22 - 30, wherein the karaoke output modifying means further comprises a lyrics adapting means for substituting at least one predetermined lyrics portion of the current karaoke performance when next displayed based on the obtained context information.

32. A computer program embodied on a computer readable medium, the computer program controlling a data-processing device to perform: obtaining context information related to a current karaoke performance, and modifying at least one audiovisual output aspect of the current karaoke performance based on the obtained context information.

33. The computer program according to claim 32, further controlling the data-processing device to perform: speech recognizing words being vocalized by a karaoke performer performing the current karaoke performance .

34. The computer program according to claim

33, further controlling the data-processing device to perform the obtaining of the context information by receiving indication of a predetermined key word included in the vocalized and speech recognized words .

35. The computer program according to claim 33, further controlling the data-processing device to perform the obtaining of the context information by receiving indication of a predetermined lyrics portion speech recognized as having been vocalized with a replacing lyrics portion.

36. The computer program according to any of the claims 32 - 35, further controlling the data- processing device to perform the obtaining of the context information by obtaining the context information as external input data.

37. The computer program according to any of the claims 32 - 36, further controlling the data- processing device to perform the modifying of the at least one audiovisual output aspect by adapting at least one soundtrack of the current karaoke performance based on the obtained context information.

38. The computer program according to any of the claims 32 - 37, further controlling the data- processing device to perform the modifying of the at least one audiovisual output aspect by substituting at least one predetermined lyrics portion of the current karaoke performance when next displayed based on the obtained context information.

39. A karaoke system, comprising: a microphone configured to receive vocalization by a karaoke performer performing a current kara- oke performance, a karaoke device configured to produce a karaoke signal comprising a video portion and an audio portion, and to mix the vocalization received by the microphone with the audio portion, a speaker configured to output the mixed audio portion, a display configured to output the video portion, a context information obtainer configured to obtain context information related to the current karaoke performance, and a karaoke output modifier configured to modify at least one of the video portion and the mixed audio portion based on the obtained context information.