WO2007132427A1

WO2007132427A1 - Ringtone customization for portable telecommunication applications

Info

Publication number: WO2007132427A1
Application number: PCT/IB2007/051836
Authority: WO
Inventors: Laurent Lucat
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2006-05-17
Filing date: 2007-05-15
Publication date: 2007-11-22

Abstract

A ringtone customization system for use with a polyphonic synthesizer (115) of a portable electronic device (104) having a microphone (102) embedded therein. A user inputs audio (e.g. voice) data (100) via the microphone (102). The audio input data (100) is then formatted and used to replace one or more channels of a pre-recorded MIDI format ringtone. Thus, a customized ringtone (120) can be generated for playback by the synthesizer (115) without the need for an external computing device.

Description

Ringtone customization for portable telecommunication applications.

FIELD OF THE INVENTION This invention relates generally to ringtone customization for portable telecommunication applications and, more specifically, to a system and method for use with a polyphonic synthesizer in a portable electronic device for ringtone customization, wherein the portable electronic device has a microphone for receiving audio data input by a user and a stored bank of pre-stored ringtones.

BACKGROUND OF THE INVENTION

Nowadays, most mobile phone handsets include a software or hardware polyphonic synthesizer for their ringtone feature. Typical achievable polyphony level is 16, 32 or 64 voices, depending on the platform capabilities. Most of (if not all) the synthesizers are compliant with the well-known MIDI (Musical Instrument and Digital Interface) file format, as well as GM (General Midi) Levell or Level2 capabilities. For instance, GM Levell capability specifies the synthesizer being able to address the 128 predefined melodic GM instruments, as well as 47 sounds drumkit.

Such devices are also increasingly becoming compliant with the well-known XMF (extensible Music Format), which basically consists of the combination of MIDI score data together with DLS (DownLoadable Sounds) the latter one consisting of customized sound waveforms (sounds not belonging to the GM database). The XMF (like other proprietary formats, such as Yamaha SMAF for Standard Mobile Audio Format) allows one to play music with sounds other than the conventional GM ones. These sounds are designed by the XMF (or SMAF,...) content creator, which would typically be a person experienced in music and audio applications, and with the help of content creation software running on a PC (Personal Computer). As a result, the end-user of the mobile device cannot personalize a ringtone by any means (the only option in this case is to select another ringing file) US Patent Application Publication No. US2004/0120506A1 describes a system wherein an embedded microphone in a portable electronic device can be used to receive audio data input by a user, such as a cough or other inconspicuous sound, and then playback this audio data as a ringtone for event notification. However, there is no facility for customizing existing pre-recorded ringtones using audio data input by the user without the need for an external computing device on which to build the customized ringtone and from which the customized ringtone can then be downloaded to the portable electronic device.

SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide a system wherein user- recorded audio data can be used to customize the audio output of a portable electronic device, without the need for an external computing device.

In accordance with the present invention, there is provided a ringtone customization system for use with a polyphonic synthesizer in a portable electronic device, said portable electronic device further comprising a microphone for receiving audio data input by a user, and a stored bank of pre-stored ringtones formatted for playback by said polyphonic synthesizer, the system comprising means for formatting the audio data input by a user for playback by said polyphonic synthesizer, and means for combining said audio data with at least a portion of a pre-stored ringtone to form a customized ringtone for playback by said polyphonic synthesizer.

Thus, a selected pre-recorded ringtone can be customized and personalized using audio data input via an embedded microphone, rather than requiring the use of an external computing device to create a customized ringtone for download onto the portable electronic device. In a preferred embodiment, a pre-recorded ringtone comprises a plurality of sound channels, one or more of which is substituted with said audio data input by the user to form said customized ringtone.

Means are beneficially provided for extracting audio data input by the user from background noise received simultaneously at said microphone. In a first exemplary embodiment, means may be provided for extracting and storing audio parameters from said audio data input by the user. Preferably, the analogue audio data is first digitized (e.g. by converting it to PCM for Pulse Code Modulation) before audio parameter extraction. In a preferred embodiment, the pre-recorded ringtones are in the known MIDI format and, in one exemplary embodiment, the polyphonic synthesizer may be wavetable-based and XMF compliant.

These and other aspects of the invention will be apparent from, and elucidated with reference to, the embodiments described herein.

BRIEF DESCRIPTION OF THE DRAWINGS Embodiments of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which:

Figure 1 is a schematic black diagram illustrating the principal components of a system according to a first exemplary embodiment of the invention; Figure 2 is a schematic black diagram illustrating the principal components of a system according to a second exemplary embodiment of the present invention;

Figure 3 is a schematic flow diagram illustrating the principal steps in building a DLS file in respect of audio data input by a user to the system of Figure 2; and

Figure 4 is a schematic flow diagram illustrating the principal steps in determining and storing user-sound timbre parameters in respect of audio data input by a user to the system of Figure 1.

DETAILED DESCRIPTION OF THE INVENTION

By way of background, while the first polyphonic synthesizers embedded in high- constrained mobile devices such as mobile phones were hardware implemented and frequency-modulation (FM) based (see e.g. Yamaha MA-I, MA-2 chipsets), before 2000, most embedded synthesizers are nowadays hardware or software implemented and based on the wavetable technology.

Basically, the wavetable approach means that real instrument notes are captured and analyzed (in Labs, at the design stage) in order to extract stationary fragments. In this way, it is not necessary to store, for example, four seconds of a trumpet sound in order to playback four seconds of the trumpet sound. Instead, by determining the start and end of a looping inside the recorded sound, it is possible to play the note with iterated looping between the start and end looping points. In this way, it is only necessary to store the samples up to the end- loop point. Depending on the stationary nature of the instrument sound, and on the targeted sounding quality, this enables a significant memory saving.

Furthermore, it is not required to store the data for all notes of the instrument, since spectral content does not change significantly between two adjacent notes (in the chromatic scale), thereby further enhancing data storage saving. In addition, a time-evolving slope can be extracted from the signal, based on its power evolution over the time. Such information can be easily captured, and can afterwards drive the synthesis process, for instance according to an ADSR (Attack-Decay-Sustain-Release) parameter model.

Some alternative approaches to the wavetable are also currently being investigated or developed, such as a parametric sinusoidal synthesizer by Philips. Basically, at the synthesizer design stage, instrument sounds are analyzed and modeled as a combination of time-evolving sinusoids (with variable frequencies and amplitudes, + noise). As was the case with the wavetable approach, and for the same reasons, start and end looping points are determined, as well as temporal slope parameters. Only the model parameter have to be stored (e.g. frequencies, amplitudes, phases, loop start, loop end, ADSR slope parameters). Instrument sound synthesis can be obtained using only these stored parameters.

It will be appreciated that the detailed implementation of the proposed invention will depend on the configuration and capabilities of the components already embedded in the mobile terminal. More precisely, it will depend on the synthesis technology used by the polyphonic synthesizer and its compliance to file formats such as XMF.

It is therefore useful to distinguish herein between two main exemplary configurations and the difference between the two configurations will lead to slightly different implementations, which will be discussed hereafter.

In a first configuration, the synthesizer is not wavetable-based or is not XMF compliant. The implementation architecture of the invention inside the mobile phone is summarized in the schematic drawing of Figure 1.

Input sound 100 (e.g. user singing voice) is captured by the microphone 102 embedded in a mobile phone 104 and converted into PCM by the analogue-to-digital converter (ADC) 106. A dedicated module 108 extracts audio parameters from the PCMs by means of a known technique of user sound (or voice) analysis. By way of example, in the case of a sinusoidal-parametric synthesizer, harmonic frequencies, amplitudes and phases are extracted, like for coding purposes. Looping points (start and end of loops) are also determined, based, for example, on (near-) cyclic periodicity detection into the signal, as well as temporal slope parameter (e.g. ADSR), based for instance on signal energy computation or an a-priori knowledge about the signal (for example, the user sound is assumed to be a singing- vo ice sound).

Then, the user-sound parameters are converted to the synthesizer internal format, as for other GM devices at module 110 stored in a user sound bank 112 (i.e. storage of user- sound timbre parameters: loop start, loop end, harmonic frequencies, amplitudes and phases, temporal envelope parameters such as pitch and amplitude modulation parameters) within the synthesizer module, in the same way as the other GM instruments. This closes a so-called "initialization process". Afterwards, for each MIDI files coming from the MIDI files database, the MIDI synthesizer substitutes one or more channel, or one of more GM instrument of the MIDI melody by the "user-instrument" retrieved from the user sound bank 112.

When a ringtone is required to be played back the required MIDI file is retrieved from a MIDI file database 114 and a required user sound is used as a substitute for one or more of the GM channels of the ringtone (or defined by the GM sound bank 113). The resultant file generated by the MIDI polyphonic synthesis kernel 115 is converted to analogue format by the DAC 116, amplified at 118 and a user-customized ringtone 120 is played back by a loudspeaker 122.

In a second configuration, the synthesizer is wavetable-based and is XMF compliant. The implementation architecture of the invention inside the mobile phone is summarized by the schematic drawing of Figure 2.

Audio input data 200 (e.g. user singing voice) is captured by the microphone 202 embedded in the mobile phone 204 and converted into PCM by the analogue-to-digital converter (ADC) 206. A dedicated module 208 extracts basic waveforms from the signal, by determining looping points (start and end of loop). A temporal slope can be also extracted (e.g. using signal energy computation), leading e.g. to ADSR parameters, or it can be modeled based on a-priori knowledge about the signal (for instance, based on the fact that the user sound is assumed to be a singing-voice sound). Then, the waveforms are converted at module 210 into suitable DLS format. This closes the "initialization process". Afterwards, for each MIDI file coming from the MIDI file database 214, the MIDI file is converted into XMF format at module 224, involving the inclusion of the "user-DLS" and the replacement of one or more GM instrument allocation (defined by the GM sound bank 213) by the user- instrument inside the XMF file, and the obtained XMF file is sent to the XMF synthesizer 215. The resultant file is then converted to analogue at 216, amplified at 218 and a user- customized ringtone 220 is played back by loudspeaker 222 as before.

Referring to Figure 3 of the drawings, in the sound parameter extraction module 208 DLS, conversion module 210, signal calibration 300 basically consists in isolating the real audio signal content by cutting the first samples where the sound is not present (i.e. it is only background noise), as well as the last background-noise samples after the sound. This can be enhanced by a known process, such as equalization, noise reduction etc. Amplitude normalization 302 is basically again applied to the signal in order to get a suitable signal dynamics.

Pitch extraction 304 is a state-of-the-art process as will be known to a person skilled in the art. Pitch normalization 306 is based on the detected pitch from the previous stage. It aims at shifting the signal pitch to a desired standard value (one of the music scale note) by any state-of-the-art process like sample interpolation or resampling.

After that, a "clean" input sample is obtained: normalized in amplitude and pitch, isolated from background. Loop points (start and end) are determined at 308 using, for example, signal autocorrelation process. An optimal looping will correspond to a maximal value of autocorrelation. Furthermore, since signal has been normalized in pitch, one may exploit the fact that the looping is expected to match with the sample period (pitch period), which will help the looping search. Signal truncation to end- loop point 310 is trivial, as will be apparent to a person skilled in the art and the result can be stored in PCM format.

Envelope parameter extraction 312 consists in extracting slopes/duration of the different phases of the signal. A typical ADSR (attack, decay, sustain, release) model may be used. Process can use signal local energy computation (state-of-the-art). Pitch and amplitude modulation parameter (typically corresponding to vibrato and tremolo) extraction 314 is optional. Instead, a default (typical) value can be used. If real extraction is desired, they can be respectively based on pitch and amplitude temporal evolution (state-of-the-art). The output from modules 308, 310, 312 and 314 are used to build at 316 a DLS file and the DLS file 318 is output. The presented description of the first configuration in relation to Figure 1 takes the example of a parametric sinusoidal MIDI synthesizer, such as the LifeVibes JingleBlaster developed by Philips.

Referring to Figure 4 in respect of the first configuration, signal calibration 400, amplitude normalization 402 and pitch extract 406 are performed as before, but pitch normalization is unnecessary. Loop points (start and end) determination 408, envelope parameter extraction 412 and pitch and amplitude modulation parameter extraction 414 are performed as before, and extraction 420 of harmonics frequencies, amplitudes (and optionally, phases) is basically strictly similar to the parametric sinusoidal audio encoding process. Phase extraction is optional, since phases can be computed at the synthesis stage. As in the previous case, pitch and amplitude modulation parameter extraction is also optional. The parameter storage format is dependent on the synthesizer implementation. Basically, this can be the same as for the known GM instruments.

For both configurations, in an improved version, it is possible to process the user- sound analysis for several samples, each one having a different pitch. Because sound characteristics of a given source (instrument, user singing voice etc) are not constant over the whole musical note range, it is desirable to have such a frequency- split analysis, in order to reach a better sound naturalness. Considering the second configuration, the DLS level 1 specification allows up to 16 different samples for the same instrument. DLS also specifies how to manage these different samples.

Thus, as explained above, prior art MIDI polyphonic ringers do not allow the end- users to personalize the sound rendering of their melodies without the help of an external computer. However, sound-rendering personalization is useful for portable devices such as mobile phones, in the way that it allows better user-ringer identification for an incoming call/incoming SMS notification, and, according to the tuning, it can produce distinctive sounding ringtones, which are commonly appreciated from a young-user experience point of view. The proposed solution performs the sound rendering personalization through the use of the embedded microphone, which allows capturing sounds like the user singing voice.

The vocal samples are not compressed in the proposed solution, but are instead analyzed for extracting voice parameter, in order to build an instrument that can be afterward used by the synthesizer kernel, for substituting one or more voices (one or more instruments) when playing any incoming MIDI file. As a consequence, the user-sound (e.g. user-voice) captured by the microphone can be considered as a synthesizer initialization, which has to be done only once (or only each time the user wants to change the personalization), that is, it does not need to be done when the incoming MIDI file is changed.

The proposed invention combines already existing elements of mobile phone devices (microphone, MIDI/XMF player) together with a new module, preferably implemented as a Software module, which aims to convert input sounds into instrument-characteristic data that can be used by the MIDI/XMF player. In this way, a "user" instrument is created.

Afterwards, by substituting one or more MIDI channel/instruments by the created "user- instrument", either inside the melody file in case of a XMF-like file or inside the synthesizer in case of a "simple" MIDI configuration, the synthesizer will be able to play input MIDI-music with some channels producing music notes having the sounding of the microphone-recorded sound (e.g. the user singing voice).

Applications of the present invention include:

Embedded solution in mobile or home phones equipped with MIDI or XMF synthesizer, for user-customized ringing (incoming call, SMS or any other notification alert) feature.

Embedded solution in mobile devices, such as mobile or cordless phones, or Personal Digital Assistants (PDAs), for entertainment purposes.

Integration in PC software, for use with a microphone or any other audio input, for entertainment purposes.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be capable of designing many alternative embodiments without departing from the scope of the invention as defined by the appended claims. In the claims, any reference signs placed in parentheses shall not be construed as limiting the claims. The word "comprising" and "comprises", and the like, does not exclude the presence of elements or steps other than those listed in any claim or the specification as a whole. The singular reference of an element does not exclude the plural reference of such elements and vice- versa. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

1. A portable electronic device (104, 204) comprising: a polyphonic synthesizer (115, 215); - a microphone (102, 202) for receiving audio data (100, 200) input by a user; a stored bank (114, 214) of pre-stored ringtones formatted for playback by said polyphonic synthesizer (115, 215); a ringtone customization system for use with the polyphonic synthesizer (115, 215), said ringtone customization system comprising means (110, 210) for formatting the audio data (100, 200) input by a user for playback by said polyphonic synthesizer (115, 215), and means for combining said audio data with at least a portion of a pre-recorded ringtone to form a customized ringtone (120, 220) for playback by said polyphonic synthesizer (115, 215).

2. A portable electronic device according to claim 1, wherein a pred-recorded ringtone comprises a plurality of sound channels, one or more of which is substituted with said audio data (100, 200) input by the user to form said customized ringtone (120, 220).

3. A portable electronic device according to claim 1, wherein the ringtone customization system comprises means (108, 208) for extracting audio data (100, 200) input by the user from background noise received simultaneously at said microphone (102, 202).

4. A portable electronic device according to claim 1, wherein the ringtone customization system comprises means (110, 210) for extracting and storing audio parameters from said audio data (100, 200) input by the user.

5. A portable electronic device according to claim 4, wherein the analogue audio data is first digitized before audio parameter extraction.

6. A portable electronic device according to claim 1, wherein the pre-stored ringtones are in the MIDI format.

7. A portable electronic device according to claim 1, wherein said polyphonic synthesizer (115, 215) is wavetable-based and XMF compliant.

8. A ringtone customization system for use with a polyphonic synthesizer (115, 215) in a portable electronic device (104, 204), said portable electronic device further comprising a microphone (102, 202) for receiving audio data (100, 200) input by a user, and a stored bank (114, 214) of pre-stored ringtones formatted for playback by said polyphonic synthesizer (115, 215), the ringtone customization system comprising means (110, 210) for formatting the audio data (100, 200) input by a user for playback by said polyphonic synthesizer (115, 215), and means for combining said audio data with at least a portion of a pre-recorded ringtone to form a customized ringtone (120, 220) for playback by said polyphonic synthesizer (115, 215).

9. A method for ringtone customization in a polyphonic synthesizer (115, 215) of a portable electronic device (104, 204), said portable electronic device (104, 204) further comprising a microphone (102, 202) for receiving audio data (100, 200) input by a user, and a stored bank of pre-stored ringtones (114, 214) formatted for playback by said polyphonic synthesizer (115, 215), the method comprising formatting the audio data input by a user for playback by said polyphonic synthesizer (115, 215) combining said audio data with at least a portion of a pre- stored ringtone to form a customized ringtone (120, 220) for playback by said polyphonic synthesizer (115, 215).