WO2007088490A1

WO2007088490A1 - Device for and method of processing audio data

Info

Publication number: WO2007088490A1
Application number: PCT/IB2007/050151
Authority: WO
Inventors: Fabio Vignoli; Albertus C. Den Brinker
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2006-01-31
Filing date: 2007-01-17
Publication date: 2007-08-09

Abstract

A device (300) for processing audio data, the device (300) comprising a mixer unit (302) adapted to mix a first audio data stream with a second audio data stream, and a shared decoder unit (303) adapted to decode a mix of the first audio data stream with the second audio data stream generated by the mixer unit (302).

Description

DEVICE FOR AND METOD OF PROCESSING AUDIO DATA

FIELD OF THE INVENTION

The invention relates to a device for processing audio data.

The invention further relates to a method of processing audio data.

The invention also relates to a program element.

Furthermore, the invention relates to a computer-readable medium.

BACKGROUND OF THE INVENTION

Audio playback devices are becoming more and more important. Particularly, an increasing number of users buy portable and/or hard disk-based audio players and other entertainment equipment.

US 2003/0183064 Al discloses a sequential playback system configured to select each sequential song based upon characteristics of an ending segment of each preceding song. Songs are selected on the basis of characteristics of the overall theme of the selection, if any, and also on the basis of musical correspondence between songs. The correspondence may be based on rhythm, notes, chords and other musical characteristics of each song. The end segment of each selected song is characterized, and the first segment of a candidate song that satisfies the overall selection criterion is compared with this characterization to determine a correspondence. If the first segment of the candidate song is inconsistent with the end segment of the previously selected song, another candidate song is found that satisfies the overall selection criterion and the first segment of this new candidate song is compared with the characterization. This process continues until a suitable candidate song is identified, or until a time limit is exceeded. Transition pieces are optionally provided to facilitate a smooth transition between songs.

EP 0,995,191 Bl discloses a speech-processing system that receives multiple streams of speech frames. The system selects, among concurrent ones of the frames, a subset of those frames that are the most relevant, based on pre-assigned stream priorities and energy content of the frames. The selected frames are then decoded and rendered. The resulting signals are mixed. This architecture is considered to provide bandwidth scalability and/or processing power scalability. However, the system described in EP 0,995,191 Bl is complex and involves much computational burden for processing speech.

OBJECT AND SUMMARY OF THE INVENTION

It is an object of the invention to allow efficient audio data-processing.

In order to achieve the object defined above, a device for processing audio data, a method of processing audio data, a program element, and a computer-readable medium as defined in the independent claims are provided.

In accordance with an embodiment of the invention, a device for processing audio data is provided, the device comprising a mixer unit adapted to mix a first audio data stream with a second audio data stream (particularly for generating a mix signal which is indicative of a mix of the first audio data stream with the second audio data stream), and a shared decoder unit adapted to decode a mix of the first audio data stream with the second audio data stream generated by the mixer unit.

In accordance with another embodiment of the invention, a method of processing audio data is provided, the method comprising the steps of mixing a first audio data stream with a second audio data stream (particularly for generating a mix signal which is indicative of a mix of the first audio data stream with the second audio data stream), and decoding the mix of the first audio data stream with the second audio data stream by using a shared decoding unit.

In accordance with a further embodiment of the invention, a program element is provided, which, when being executed by a processor, is adapted to control or carry out a method of processing audio data having the above-mentioned features.

In accordance with yet another embodiment of the invention, a computer- readable medium is provided, in which a computer program is stored which, when being executed by a processor, is adapted to control or carry out a method of processing audio data having the above-mentioned features.

The audio processing in accordance with embodiments of the invention can be realized by a computer program, i.e. by software, or by using one or more special electronic optimization circuits, i.e. in hardware, or in a hybrid form, i.e. by means of software components and hardware components.

In accordance with an embodiment, a single common decoder may be used and/or a single common mixer may be used to generate a mix signal and decode the mix signal of two different audio data streams, for instance, two songs to be played back, while using an automatic DJ feature. By sharing exactly one decoder between the two audio items, it may be dispensable to separately decode each of the audio data streams to be mixed together, which would require the implementation of two audio decoders. Taking this measure may allow a simplified construction of the audio data-processing device, because only a single decoder may be sufficient to decode the audio and thus give the mixed audio a proper format for audible reproduction. This may reduce the size and costs of manufacturing the audio data-processing and reproduction device, and may allow implementation of a mixing or DJ function, even in a device with limited computational resources, such as a mobile phone or any other portable consumer electronics device.

Particularly, the shared decoding and/or mixing may be realized by using parametric audio encoding schemes, i.e. by providing the audio data streams to be processed together in accordance with a parametric audio encoding scheme, such as sinusoidal encoding (SSC). In such a scenario, the shared decoder and/or the mixer may be tailored so as to be capable to process parametric audio content.

In accordance with an embodiment of the invention, a method for audio DJ on mobile phones may be provided. Particularly, a method of producing a smooth transition between two songs of a playlist (auto DJ) is provided, in mobile phone-like environments, in which parametric SSC encoding (for instance, in the context of a Jingle Blaster application) may be used to implement computationally efficient "time- stretching" (TSM), which may be required in a transition algorithm applied to generate a transition between two audio pieces.

A conventional DJ application on a playback device may take, as input, a generated playlist and some attributes from a database and may first determine the best transition (for instance, beat mixing, harmonic mixing, etc.) between songs of the playlist and, secondly, implement the transitions while the music is playing. To implement a smooth transition, some additional algorithms such as time-stretching that run in real time may be needed or desired. However, in such an implementation, the auto DJ player needs to decode the last part of a preceding song together with the first part of a subsequent song during the transition, while applying time-stretching (TSM) and then doing the real mix.

In contrast to such a conventional approach, an embodiment of the invention is based on an audio signal-processing architecture for generating an output signal comprising a transition from a first audio signal to a second audio signal. Such a processing device or scheme may comprise an input unit adapted to receive a parametrically encoded first audio data (stream) representing the first audio signal and a parametrically encoded second audio data (stream) representing the second audio signal. Furthermore, an output unit may be provided for supplying the output signal. Moreover, a processing unit may be adapted to perform the transition from the first audio signal to the second audio signal by processing parameters of the parametrically encoded first and second audio data, yielding a processed audio data stream. A decoding unit may be adapted to decode the processed audio data stream, yielding the output signal.

In accordance with an embodiment, the processing unit may be adapted for time-stretching by generating sinusoids and noise over longer frame sizes.

In accordance with another embodiment, the processing of parameters of the parametrically encoded first and second audio data may comprise scaling of the amplitudes of the sinusoids, and/or scaling of the temporal envelope of the noise, and/or scaling of the amplitude of the transient envelope.

In accordance with a further embodiment, the parametric data may be interpolated (for instance, for synchronization purposes).

An embodiment of the invention is an auto DJ with SSC (sinusoidal coding). In such a technology, a transition between two audio items may be generated by exactly one processor, which may be realized by implementing a parametrically encoded audio data stream, for instance, SSC.

An audio DJ functionality may thus be obtained with reasonable computational complexity. Only one decoder needs to be run. The computational load of the time-stretching (TSM) can be significantly reduced by shifting the TSM and mixer to the front and the decoder to the back of the processing chain. Furthermore, this can be obtained by taking a bit stream as input, which allows efficient ways of TSM (and optionally mixing), i.e. by using parametrically encoded audio data such as MPEG-4 SSC.

Time- stretching may be realized efficiently in the SSC code simply by generating sinusoids and noise over longer frame sizes. This may mean that no extra processing may be required. Only longer or shorter frames may be generated in the decoder, and the parametric data may be read at a slower or faster pace.

Such a system may be implemented in a portable music player, for instance, included in a mobile phone.

In accordance with an embodiment, an auto DJ feature for mobile telephones may be provided. Thus, an auto DJ may be run in combination with a Jingle Blaster technology (based on SSC coding). This can be used to create auto DJ with a low computational load which is even compatible with the limited computational resources of a mobile phone. Such a method may allow avoiding the use of two (for instance, MP3) decoders at the same time, and replacing it with a single SSC decoder. Such a method may also allow avoiding the use of two time-stretching modules (to adapt a tempo keeping the pitch invariant), because SSC parametric encoding can perform this at the parameter level.

A conventional system (similar to that disclosed in US 2003/0183064 Al) may be denoted as a media player with auto DJ mode. A more refined system (similar to that shown in Fig. 1) may be based on an implementation of an auto DJ player. Such architecture may require the use of two simultaneous MP3 decoders (at least during the mix) and the use of two Time Stretching Algorithms (to match the tempi of the two mixing songs). This architecture may be inappropriate for portable players with little computational power.

In order to meet the requirements of such a limited computational power, an embodiment of the invention is based on an automatic D J- ing of two songs on a portable player that is much less computationally intensive. Such an embodiment may make use of SSC parametric audio encoding.

Further embodiments of the device for processing audio data will be explained hereinafter. However, these embodiments also apply to the method of processing audio data, the program element and the computer-readable medium.

The device (particularly the mixing unit and/or the decoder unit) may be adapted to process the first audio data stream with the second audio data stream being provided as parametric audio data streams. Many conventional audio compression techniques are based on "waveform coding", such as MP3, which may achieve an improved compression ratio by exploiting irrelevancy and redundancy available in any audio signal. However, parametric encoding schemes may be used for even higher compression ratios. A parametric encoder does not or not only exploit limitations of the human hearing system, but attempts to model the audio signal, for instance, by describing an audio piece by means of "abstract" characteristic parameters. A high compression rate may thus be obtained.

In the light of the foregoing, the term "parametric encoding" may particularly denote an audio encoding scheme that analyzes an audio signal so as to model the signal and describe the particular audio signal with parameter values specifying the parameter model. In the context of a parametric audio data code, an incoming audio signal may be dissected into, for instance, three "objects", namely transients (which may be localized in time), sinusoids (which may be localized in frequency), and noise (without any strict localization). However, alternative parametric encoding schemes are possible.

The device (particularly the mixing unit and/or the decoder unit) may be adapted to process the first audio data stream with the second audio data stream being provided as sinusoidally encoded audio data streams. In other words, sinusoidal encoding (SSC) may be applied to the audio data streams to be processed in accordance with an embodiment of the invention. Thus, an example of a parametric encoding scheme (which may be advantageously implemented in embodiments of the invention) is sinusoidal encoding (SSC). Sinusoidal encoding may aim at modeling a signal as a sum of sinusoids and is a suitable way to implement compression strategies for the purpose of encoding speech signals that, indeed, have some kind of periodic behavior. Consequently, such an encoding technique may result in an efficient and robust representation of audio signals and may be applied in the context of the invention, particularly for mixing and generating transients and transitions between two audio items. Such a parametric encoder based on the SSC principle may also analyze a stereo input and may derive transients, sinusoids, noise, and optionally stereo parameters describing the audio content in terms of parameter values.

For a more detailed explanation of SSC, particular reference is made to M. G. Muzzi, "Amelioration d'un codeur parametrique, DEA ATIAM 2002 - 2003", available via: http://rechercheMxam.fr/equipes/repmus/MemoiresATIAM0205/Muzzi.pdf. With regard to this document, explicit reference is made to chapter 3.4 describing parametric encoding and chapter 4 explaining sinusoidal encoding. Explicit reference is made to those parts of the cited document that may be implemented in embodiments of the invention.

The device may be adapted to process the first audio data stream with the second audio data stream being provided as MPEG audio data streams, more particularly as MPEG-4 audio data streams. MPEG (Moving Picture Experts Group) is a working group of ISO/IEC charged with the development of video and audio encoding standards. MPEG-I is an initial video and audio compression standard. MPEG-2 is a transport, video and audio standard for broadcast quality television. MPEG-4 expands MPEG-I to support video/audio "objects", 3D content, low bit-rate encoding, etc. Embodiments of the invention may be implemented in the context with any existing or future MPEG version.

The mixer unit may be adapted to mix the first audio data stream with the second audio data stream so as to generate a transition between an end portion of the first audio data stream and a beginning portion of the second audio data stream. The mixer unit may thus apply some kind of auto DJ function, or, in other words, generate transitions between an outgoing audio data piece and an incoming audio data piece. With regard to the art of auto DJ, reference is made to US 2003/0183064 Al.

The device may comprise a shared time-stretching module adapted to control a tempo of the first audio data stream and the second audio data stream. By using a time- stretching module, which is shared among the two (or more) audio items to be processed, i.e. by using a single time-stretching module for both audio data streams in common, the effort necessary for constructing such a device may be significantly reduced.

In an audio data-processing path, the shared time-stretching module may be located in advance of the shared decoder unit. For instance, an output of the shared time- stretching module may be coupled to an input of the shared decoder unit. In other words, the shared time-stretching module that may be combined with the mixing unit may be arranged in the signal path before decoding. This measure may significantly simplify signal processing.

The device may comprise a gain unit adapted to adjust or control a gain of the mix of the first audio data stream and the second audio data stream. Such a gain may be under the control of a user operating an input/output interface and may allow the user to adjust an amplitude of the generated audio item.

Particularly, the gain unit may be adapted to adjust the mix of the first audio data stream and the second audio data stream with regard to at least one of the group consisting of an amplitude of sinusoids indicative of the mix of the first audio data stream and the second audio data stream, a temporal envelope of noise indicative of the mix of the first audio data stream and the second audio data stream, and a scaling of an amplitude of a transient envelope indicative of the mix of the first audio data stream and the second audio data stream. The gain unit may thus be particularly applied in the context of SSC encoding and decoding. For realizing a transition from a first song to a second song, a gain may be adjusted with the parametric data for each of the parametric data. In particular, this may include scaling the amplitudes of the sinusoids, the temporal envelope of the noise, and the scaling of the amplitude of the transient envelope.

The device may comprise a synchronization unit adapted to synchronize the first audio data stream and the second audio data stream. For instance, as a consequence of time scaling, the frames of both streams to be processed with one another may run at different rates, and the parametric data may therefore be interpolated for synchronization purposes. Many synchronization techniques, which are available and known to the person skilled in the art, may be implemented in this context.

Particularly, the synchronization unit may be adapted to synchronize the first audio data stream and the second audio data stream by interpolating the first audio data stream and the second audio data stream. This may harmonize the signal processing and may take differences or characteristics of the two audio data streams into account. The mixer unit may be adapted to apply a psycho-acoustic trick (i.e. psycho- acoustical knowledge or rules for eliminating inaudible components or selecting dominant components) to the first audio data stream and the second audio data stream. Applying psycho-acoustical knowledge or rules for eliminating inaudible components or selecting dominant components during processing of the data may improve the subjectively perceived quality of the processed data. An example of psycho-acoustical knowledge or rules for eliminating inaudible components or selecting dominant components, which may be applied so as to improve the subjective quality perceived by a human listener, is the missing fundamental principle.

The device may be realized as a portable device. The computational resources may be limited in a portable device that is adapted to be used in a mobile manner by a user without the necessity of permanently keeping this device in a fixed position. Embodiments of the invention may be advantageously applied in the context of a portable device, because the parametric encoding may allow generation of an auto DJ application with a low computational load.

The audio device may comprise an audio reproduction unit such as a loudspeaker, an earpiece or a headset. The communication between audio-processing components of the audio device and such a reproduction unit may be carried out in a wired manner (for instance, using a cable) or in a wireless manner (for instance, via a WLAN, infrared communication or Bluetooth).

The audio device may be realized as a portable audio player, a DVD player, a CD player, a hard disk-based media player, an Internet radio device, a public entertainment device, an MP3 player, a game console, a vehicle entertainment device, a car entertainment device, a portable video player, a mobile phone, a medical communication system, a body- worn device, and a hearing aid device. A "car entertainment device" may be a hi-fi system for an automobile.

However, although the system according to the invention primarily intends to facilitate playback of sound or audio data, it is also possible to apply the system for a combination of audio data and video data. For instance, an embodiment of the invention may be implemented in audiovisual applications such as a video player in which a loudspeaker is used, or a home cinema system.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings,

Fig. 1 shows an audio data-processing system.

Fig. 2 shows an audio data-processing device.

Fig. 3 shows an embodiment of an audio data-processing device according to the invention.

Fig. 4 shows an embodiment of an audio data-processing system according to the invention.

DESCRIPTION OF EMBODIMENTS

The illustrations in the drawings are schematic. In the different Figures, similar or identical elements are denoted by the same reference signs.

An audio data-processing system 100 will now be described with reference to Fig. 1.

The audio data-processing system 100 includes an implementation of auto DJ in a similar manner as described in US 2003/0183064. This auto DJ implementation contains a high level description of the auto DJ implementation.

"The Auto DJ application" is a complete application that helps the user in choosing some music according to his preferences or the situation at hand and plays it back as a DJ would do.

Such an auto DJ system 100 comprises a user interface 101 (to enable navigation and playlist generation), a playlist generation unit 102, a database unit 103, a transition analyzer unit 104, a player unit 105, and a content analysis algorithm unit 106. Furthermore, real-time rendering algorithms may be performed in such a system.

The auto DJ application can be described with reference to the block diagram as shown in Fig. 1. The user interface 101 is implemented in the framework of a MediaBrowser and is used to obtain a human user's preferences. These preferences can be expressed in the form of genre (for instance, "I want Rock music") or in the form of selected songs (for instance, "I want music similar to Billy Jean by Michael Jackson") or other forms. The choice for the playlist generation unit 102 determines the form of these preferences, and vice versa.

The playlist generation unit 102 may take the user preferences and may generate a sequence of songs based on information coming from the database unit 103. The database unit 103 may contain two different types of metadata, namely catalog metadata and content analysis metadata. Catalog metadata may include information such as the artist's name, album name, song name, genre, etc. Catalog metadata may be available from ID3 tags (or from online third parties). Content analysis metadata may be obtained by applying content analysis algorithms directly on the music files.

In the auto DJ block diagram 100, a user enters user preferences 107 via the user interface 101. The playlist generation unit 102 receives such user preferences 107 and also receives catalog metadata and content analysis metadata 108 provided by the database unit 103. Furthermore, the database unit 103 provides content analysis metadata 109 to the transition analyzer unit 104. The database unit 103 is supplied with content analysis data 110 provided by the content analysis algorithm unit 106 which receives audio data 111 from an audio source 112 on which content is stored, such as a hard disk, a CD or a DVD. The system formed by components 112 and 106 may be operated offline. In the database unit 103, each song may have a unique identifier.

Sequence and transition information 113 is provided at an output of the transition analyzer unit 104 and is supplied to the player unit 105. Audio content 114 is provided to the player unit 105 by an audio source 115 on which content is stored, such as a hard disk, a CD or a DVD. Mixed audio 116 is provided at an output of the player unit 105.

A design issue that follows from target platform properties is to limit the amount of real-time tasks for the auto DJ. Some tasks, such as time scale modification, will have to be performed in real time. Other tasks, however, can be avoided to run real time. An example is the analysis of audio to determine the optimal position for the start of mixing two tracks. A possible way to do this is to use an idle-priority process and store essential results in the database unit 103.

The content analysis algorithms carried out by the content analysis algorithm unit 106 can be run offline or in the background. They provide new metadata used by the playlist generation unit 102 and the player unit 105, which will be added to the database unit 103, but are not directly involved in generating or rendering the playlist.

The player unit 105 takes, as an input, the generated playlist and some attributes from the database unit 103 and performs two operations: first, it determines the best transition between the songs (for instance, beat mixing, harmonic mixing, etc.) and, secondly, it implements the transitions while the music is playing. To implement a smooth transition, some additional algorithms such as time-stretching that run in real time may be needed.

The audio DJ player may be provided with a structure as shown in Fig. 2. A first song 200 is provided to a first MP3 decoder 201. A second song 202 is provided to a second MP3 decoder 203. The MP3 decoders 201, 203 decode the MPEG content 200, 202, respectively, and provide the generated audio data to respective Time Stretching Module (TSM) units 204, 205. The outputs of the two TSM units 204, 205 are supplied to an input of a mixer unit 206 that mixes the audio content and provides a mixed audio content to a gain unit 207 for adjusting the gain of the audio content to be played back. At an output of the gain unit 207, mixed song data 208 is provided for playback.

In accordance with the architecture of Fig. 2, two MP3 decoders 201,203 and two TSM units 204,205 are thus needed.

In this implementation, the auto DJ player unit 105 needs to decode the last part of the first song 200 together with the first part of the second song 202 during the transition, applies time-stretching (TSM) by using the TSM unit 204, 205 and then performs the real mix in the mixer unit 206.

An embodiment of an audio data-processing device 300 according to the invention will now be described with reference to Fig. 3.

The device 300 for processing audio data comprises a shared mixer unit 302 adapted to mix first audio data with second audio data (for generating a mix signal provided at an output of the mixer unit 302, which signal is indicative of a mix of the first audio data with the second audio data), and comprises a shared decoder unit 303 adapted to decode the mix signal representing the mix of the first audio data with the second audio data generated by the mixer unit 302.

In more detail, parametrically (namely SSC) encoded first and second audio data items are provided by an SSC song unit 301. These first and second audio data items are provided at an input of the SSC TSM/mixer unit 302 for processing the song sequence and transition. A bit stream in accordance with an output of the SSC TSM/mixer unit 302 is applied to the SSC decoder unit 303, i.e. a single decoder for decoding the mixed audio content. Mixed songs 304 for reproduction by a reproduction unit (not shown in Fig. 3), such as a loudspeaker, are provided at an output of the SSC decoder unit 303.

The device 300 thus processes the first and second audio data as parametric audio data streams, namely as sinusoidally encoded audio data streams (SSC). The audio data pieces provided by the SSC song unit 301 may be provided as MPEG-4 audio data streams. The mixer unit 302 mixes the first audio data with the second audio data so as to generate a transition portion between an end portion of the first audio data and the beginning portion of the second audio data. An auto DJ implementation may thus be made possible with low computational effort. A shared time-stretching functionality may be integrated in the mixer unit 302 so that the tempo of the first audio data and the second audio data may be controlled. The system 300 may be implemented in a portable device, for instance, in a mobile phone having an auto DJ function.

The embodiment shown in Fig. 3 takes advantage of the parametric SSC encoding to avoid both the TSM and the MP3 decoding steps that are necessary in accordance with the scheme of Fig. 2.

In order to reduce the computational complexity associated with the scheme of Fig. 2, it is advantageous in the context of Fig. 3 to provide a method in which only one decoder 303 needs to be run and in which the computational load of the TSM (included in mixer unit 302) can be greatly reduced.

In accordance with Fig. 3, this is achieved by shifting the TSM and the mixer 302 to the front of the processing path and the decoder 303 to the back of the processing chain. Furthermore, this may be achieved by taking a bit stream as input, which allows efficient ways of TSM (and optionally mixing), i.e. by using parametric encoding audio data such as MPEG-4 SSC.

Time- stretching is very efficiently realized in the SSC code simply by generating sinusoids and noise over longer frame sizes. This essentially means that no extra processing is required. Only longer or shorter frames have to be generated in the decoder unit 303, and the parametric data is read at a slower or faster pace.

The following operations are performed for mixing. To realize transitions from the first audio data to the second audio data, a gain is attached to the parametric data for each of the parametric data. In particular, this means scaling the amplitudes of the sinusoids, the temporal envelope of the noise and scaling of the amplitude of the transient envelope. Since, due to the time scaling, the frames of both streams may run at a different rate, the parametric data is interpolated for synchronization purposes. Different synchronization strategies are possible.

Subsequently, the two sets of sinusoids are merged. For efficiency reasons, it is advantageous to discard the irrelevant part of the sinusoids. This may be done inside the mixer unit 302 by applying a simple psycho-acoustic model (which may include psycho- acoustical knowledge or rules for eliminating inaudible components or selecting dominant components). The two noise models may be merged as well. The data describing the noise synthesis filters of both streams are recalculated to one joint noise-generating synthesis filter, and the temporal envelopes are recalculated to a single temporal envelope.

The transient components can also be merged. However, since transients and songs rarely occur simultaneously in both audio streams in the same frame, an effective way of dealing with transients is to take the strongest one whenever transients occur simultaneously in both streams in the same frame. In all other cases, the transients of both are incorporated in the transient stream. The output of the mixer is a new (joint) parametric (SSC) audio stream, which can be interpreted by a standard SSC decoder.

An embodiment of an audio data-processing system 400 according to the invention will now be described with reference to Fig. 4.

In the audio data-processing system 400, a human user may control a central control unit 402 via a user input/output interface 401. Such a user interface 401 may be a graphical user interface (GUI). The graphical user interface may include a display device (e.g. a cathode ray tube, a liquid crystal display, a plasma display device or the like) for displaying information to a human operator. Furthermore, the user interface 401 may comprise an input device allowing the user to input data (e.g. data specifying the operation mode of the system 400) or to provide the system 400 with control commands. Such an input device may include a keypad, a joystick, a trackball, a touch screen, or may even be a microphone of a voice recognition system. The user interface 401 may allow a human user to communicate in a bi-directional manner with the system 400.

Under the control of a human user operating the user I/O interface 401, the control unit 402 (e.g. a microprocessor, central processing unit, CPU) may control the functionality of the other components shown in Fig. 4.

Audio content may be stored in an audio source 403 (such as a hard disk, a CD or a DVD) and may be provided to a mixing unit 302. The mixing unit 302 may mix two songs to generate a transition in the context of an auto DJ application. The mixing unit 302 may be adapted to mix two audio data items provided in a parametric format. The output of the mixer unit 302 is supplied to the decoder unit 303 for decoding the mixed data so as to generate at its output data to be played back and representing a transition between the first and the second song. A gain unit 404 may be provided to adjust the gain of the mixed audio items. Reproducible audio data may be provided at an output of the gain unit 404 and supplied to a loudspeaker 405 for reproduction. Instead of the loudspeaker 405, earpieces or headphones may be connected. It should be noted that use of the verb "comprise" and its conjugations does not exclude other elements, steps or features and that use of the indefinite article "a" or "an" does not exclude a plurality. Also elements described in association with different embodiments may be combined.

It should also be noted that reference signs in the claims shall not be construed as limiting the scope of the claims.

Claims

CLAIMS:

1. A device (300) for processing audio data, the device (300) comprising a mixer unit (302) adapted to generate a mix signal which is indicative of a mix of a first audio data stream with a second audio data stream; and a shared decoder unit (303) adapted to decode the mix signal generated by the mixer unit (302).

2. The device (300) according to claim 1, adapted to process the first audio data stream with the second audio data stream being provided as parametric audio data streams.

3. The device (300) according to claim 2, adapted to process the first audio data stream with the second audio data stream being provided as sinusoidally encoded audio data streams.

4. The device (300) according to claim 1, adapted to process the first audio data stream with the second audio data stream being provided as MPEG audio data streams.

5. The device (300) according to claim 1, adapted to process the first audio data stream with the second audio data stream being provided as MPEG-4 audio data streams.

6. The device (300) according to claim 1, wherein the mixer unit (302) is adapted to generate the mix signal so as to generate a transition portion based on an end portion of the first audio data stream and based on a beginning portion of the second audio data stream.

7. The device (300) according to any one of claims 1 to 6, comprising a shared time-stretching module (302) adapted to control a tempo of the first audio data stream and the second audio data stream.

8. The device (300) according to claim 7, wherein, along an audio data-processing path, the shared time-stretching module (302) is located in advance of the shared decoder unit (303).

9. The device (400) according to claim 1, comprising a gain adjustment unit (404) adapted to adjust a gain of the mix signal.

10. The device (400) according to claim 9, wherein the gain adjustment unit (404) is adapted to adjust the mix signal with regard to at least one of the group consisting of an amplitude of sinusoids indicative of the mix signal, a temporal envelope of noise indicative of the mix signal, and a scaling of an amplitude of a transient envelope indicative of the mix signal.

11. The device (300) according to claim 1 , comprising a synchronization unit adapted to synchronize the first audio data stream and the second audio data stream.

12. The device (300) according to claim 11, wherein the synchronization unit is adapted to synchronize the first audio data stream and the second audio data stream by interpolating the first audio data stream and the second audio data stream.

13. The device (300) according to claim 1, wherein the mixer unit (302) is adapted to apply psycho-acoustical knowledge or rules for eliminating inaudible components or selecting dominant components to the first audio data stream and the second audio data stream.

14. The device (302) according to claim 1, realized as a portable device.

15. The device (300) according to claim 1, realized as at least one of the group consisting of a GSM device, headphones, a gaming device, a laptop, a portable audio player, a DVD player, a CD player, a hard disk- based media player, an Internet radio device, a public entertainment device, an MP3 player, a hi-fi system, a vehicle entertainment device, a car entertainment device, a portable video player, a mobile phone, a medical communication system, a body-worn device, and a hearing aid device.

16. A method of processing audio data, the method comprising the steps of: generating a mix signal which is indicative of a mix of a first audio data stream with a second audio data stream; and decoding the mix signal by using a shared decoding unit (303).

17. A program element, which, when being executed by a processor (402), is adapted to control or carry out a method of processing audio data, the method comprising the steps of: generating a mix signal which is indicative of a mix of a first audio data stream with a second audio data stream; and decoding the mix signal by using a shared decoding unit (303).

18. A computer-readable medium, in which a computer program is stored which, when being executed by a processor (402), is adapted to control or carry out a method of processing audio data, the method comprising the steps of: generating a mix signal which is indicative of a mix of a first audio data stream with a second audio data stream; and decoding the mix signal by using a shared decoding unit (303).