EP1938325A2 - Method and apparatus for processing audio for playback - Google Patents

Method and apparatus for processing audio for playback

Info

Publication number
EP1938325A2
EP1938325A2 EP06796003A EP06796003A EP1938325A2 EP 1938325 A2 EP1938325 A2 EP 1938325A2 EP 06796003 A EP06796003 A EP 06796003A EP 06796003 A EP06796003 A EP 06796003A EP 1938325 A2 EP1938325 A2 EP 1938325A2
Authority
EP
European Patent Office
Prior art keywords
audio
chromagrams
beginning
transition
audio track
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP06796003A
Other languages
German (de)
English (en)
French (fr)
Inventor
Steffen C. Pauws
Fabio Vignoli
Aweke N. Lemma
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority to EP06796003A priority Critical patent/EP1938325A2/en
Publication of EP1938325A2 publication Critical patent/EP1938325A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/038Cross-faders therefor
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs

Definitions

  • the present invention relates to a method and apparatus for processing audio for playback.
  • it relates to playback of audio in which a smooth transition is provided between successive pieces of audio.
  • AutoDJ is a software iunction in a consumer hardware platform that has "knowledge" of music and can thus choose and mix songs from a given database. AutoDJ is not a tool that is used by human DJs to perform audio mixing, it is rather a replacement for the human DJ and operates with minimal intervention.
  • An AutoDJ does not provide mere crossfade transitions but is capable of applying different types of transitions depending on the audio content and the user preferences.
  • An AutoDJ can be divided into two parts: (1) generating playlists, sorting songs according to their degree of likeness, i.e., it has some "knowledge of music"; and (2) mixing consecutive songs and playing the mix. Mixing the songs includes the steps of computing the type and size of the transition, determining the exact mixing points and playing the music.
  • Such AutoDJ systems offer complex sound processing functionality to realize various transitions between consecutive tracks (e.g. equalization of tempo and synchronization of beat phase) as well as analysis of consecutive tracks. It determines a transition based on straightforward criteria, then executes the transition.
  • Bpm DJ is a closed system where predefined playlists are mixed at live events.
  • the predefined mixes are based on different genres. For examples choice include South Dakota wedding SJ mix, a Sioux Falls, Brookings mix, Chamberlain mix, or Watertown event mix or prom, school dance, party etc. These are all based on a known database and playlists.
  • DJ Mix Pro provides more flexibility in its choice of playlist and performs beat mixing based on the input playlist to a certain extent. However, it lacks the ability to determine important song semantics such as phrase boundaries. Automatically mixing songs to create seamless transitions between two songs is also disclosed in US2003/0183964. A drawback of the known automatic mixing methods is that the created mix transitions are often short or of poor quality.
  • a music track is a piece of audio, for example, a song which may be stored in a digital format for subsequent playback.
  • a method for processing audio for playback to provide a smooth transition between a beginning region of an audio track and an end region of a previous audio track comprising the steps of: correlating a quantity representative of a chromagram at a mixing point of said beginning region of said audio track and a quantity representative of a chromagram at a mixing point of said end region of said previous audio track; and smoothing the transition between the successive audio tracks during playback at the mixing points of said beginning region of said audio track and said end region of said previous audio track on the basis of a correlation between the quantities representative of the chromagrams.
  • a quantity representative of a chromagram may be the chromagram itself or one or more values derived from the chromagram.
  • an apparatus for processing audio for playback to provide a smooth transition between a beginning region of an audio track and an end region of a previous audio track comprising: a correlator for correlating a quantity representative of a chromagram at a mixing point of said beginning region of said audio track and a quantity representative of a chromagram at a mixing point of said end region of said previous audio track; and smoothing the transition between the successive audio tracks during playback at the mixing points of said beginning region of said audio track and said end region of said previous audio track on the basis of a correlation between the quantities representative of the chromagrams.
  • Chromagrams have turned out to be very useful for creating smooth mixes of audio tracks. The system can work with any collection of audio with any prior knowledge of the songs.
  • Chromagrams can be used to select and sort audio recordings in a playlist in such a way that each pair of successive recordings have similar harmonic or chordal contexts at their mixing points. Such an optimal arrangement of audio recordings may be achieved by finding the maximum correlation between the chromagrams of the mixing points of any two successive audio recordings in a playlist.
  • a plurality of audio tracks are selected for playback in an order based on the correlation of the chromagrams of the beginning and end regions of successive audio tracks. More preferably the order of playback is determined to optimise correlation of the chromagrams of the beginning and end regions of successive audio tracks.
  • the order of the playlist may be determined on the basis of a local search method wherein the penalty as to whether a sequence of audio tracks meets predefined constraints is calculated; and the sequence of audio tracks which has the least penalty is derived iteratively.
  • the penalty may be calculated on the basis of the correlation between the chromagrams of the audio tracks.
  • the duration of the mixed transition between successive audio tracks may be determined on the basis of the correlation of the chromagrams of the beginning and end regions of said successive audio tracks.
  • the chromagrams are computed by harmonically compressing the amplitude FFT-based spectral of the content of each audio track over a predetermined number of octaves. The number of octaves may be six. Then the harmonically compressed amplitude spectrum of each audio track are filtered by multiplying the spectrum by a Hamming window. The values of the amplitude spectrum are extracted at and around the spectral peaks.
  • a chromagram is an encoding of the likelihood of all 12 chromas in music audio.
  • a chroma is a scale position category of a note, represented by the note name (e.g., 'C, 'C#', 'D', ...), disregarding its octave. So, two pitches that are an octave apart, share the same chroma but differ in pitch height. Chroma is thus cyclic in nature by octave periodicity. In this way, the chromagram summarizes the harmonic/chordal content of a music sample in a very compact form as a 12-element feature vector.
  • Chromagrams of major keys have the highest occurrence for their tonic, and the other two pitches of the triad (major third, perfect fifth), followed by the rest of pitches of the major scale to be concluded by the non-scale pitches.
  • chromagrams of minor keys or their corresponding minor chordial triad
  • chromagrams for different major or minor keys are all transpositions of each other. For instance, the chromagram for C major can be shifted six positions to arrive at a chromagram for G major. This makes the Pearson's product moment correlation between chromagrams of two audio samples an excellent candidate for computing the harmonic similarity between the two audio samples.
  • Figure 1 is a simple schematic of a known automatic DJ system
  • Figure 2 is a graphical representation of typical mixing materials of an audio track
  • Figure 3 is a simple schematic of an AutoDJ system according to an embodiment of the present invention.
  • Figure 4 is a simple schematic of a mixer of the AutoDJ system of Fig. 3. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • a basic schematic of a known AutoDJ is shown in Figure 1. It comprises a song database 101 which outputs songs to a content analyzer 103 and a player 105.
  • the content analyzer 103 analyzes the songs stored in the database 103 to extract parameters that represent the rhythmical and perceptual properties of the content. These include, among other things, song segmentation parameters, tempo and beat locations (onsets), harmonic signature, etc. These parameters (collectively referred to as AutoDJ meta- information) are conveniently computed offline and stored in or added to a linked feature database 107.
  • a play list generator 109 is fed user preferences and using the database 101 creates a suitable playlist. Given, such a playlist, a transition planner 111 compares the
  • the player 105 streams the songs from the database 101 into the output-rendering device 113 (e.g. loudspeaker) executing the sequence of commands dictating how the songs should be mixed and played back in a rhythmically consistent and smooth way.
  • the output-rendering device 113 e.g. loudspeaker
  • Another known method of generating a suitable playlist in an AutoDJ is use of linear combination Gaussian of kernel functions to model user preferences.
  • the model attempts to learn user preferences by expressing similarities between songs metadata using the kernel functions. Once the AutoDJ is trained, the learned behavior is directly applied to other larger sets of songs.
  • the fundamental assumption is that the metadata consistently summarizes the song it is linked to.
  • Figure 2 illustrates the typical construction of a music track.
  • the structure of the music track is merely an example and the type of transition to be implemented is dependent on where the mix is applied, or conversely the intervals used can depend on the envisaged type of transition.
  • the music track (or song) can be broken down into three main parts, namely, Intro, Meat and Outro. These and several other regions can be defined as follows:
  • First audible (Fade-in moment) 201 The location at which the track just exceeded the hearing threshold for the first time;
  • Intro 202 This is used for analysis purposes only. It is used as an anchor for the blend- in moment pointer. It reduces the probability that part of the intro is in the beat mix transition;
  • Blend- in moment 203 This identifies the location of a beat onset that, in case of a beat mix, will be synchronized to the blend-out moment of the previous track in the play list.
  • Blend-out moment 204 This identifies the location of a beat onset that, in case of a beat mix, will be synchronized to the blend- in of the next track in the play list.
  • Start of Outro 205 This is used for analysis purposes only. It is used as an anchor for the blend-out moment pointer. It reduces the probability that part of the outro is in the beat mix transition;
  • Last audible (Fade-out moment) 206 The location at which the track just exceeded the hearing threshold for the last time;
  • Fade-In Area Area A: Area in which transition type fade-in is applied. It is entirely situated in the intro of the song and extends starting from the Fade-In moment 201. Its actual duration will depend on the characteristics of the preceding song.
  • Blend-In Area (area B) Region in which beat-mix with the previous song can take place. It is fully in the Meat of the song and extends starting from the Blend-In moment 203. Its exact duration depends on the characteristics of the preceding song.
  • Blend-out Area Region in which beat-mix with the next song can take place. It is fully in the Meat of the song and extends up to the Blend-Out moment 204. Its actual duration will depend on the characteristics of the next song.
  • Fade-Out Area Area D: Area in which transition type fade-out is applied. It is entirely in the outro part of the song and extends up to the Fade-out moment 206. Its actual duration will depend on the characteristics of the next song.
  • AutoDJ meta-information is stored in a database. Outside these regions, a fancy mix can be implemented if real-time computation of meta- information for an arbitrary region is possible. When the meta-information is not available, the AutoDJ of the preferred embodiment could utilize a simple CD-style transition.
  • the first step in the AutoDJ system is to extract signal features that enable automatic selection and sorting of contents.
  • two categories of AutoDJ features are identified, namely, the set of features that are necessary to make artistically consistent mixes (referred to as playlist features) and the set of features that are necessary to perform rhythmically consistent mixes (referred to as rhythmical features).
  • the playlist features are sets of features that are used to construct a meaningful (artistically consistent) collection of songs satisfying user criteria.
  • Playlist generation can be commenced based on metadata delivered along with the records.
  • metadata is in most cases manually compiled and is based on some intuitive knowledge of the circumstances of the song such as genre of the artist.
  • Commonly provided metadata include publication year, artist name, genre tag etc.
  • the metadata based playlist generation method basically assumes that the compiled metadata correctly describes the song it is linked to. This assumption is however very unlikely to be fulfilled because the metadata attached to ripped contents are in most cases arbitrarily filled and do not necessary represent the behavior of the song. Thus, the link between song feature and metadata could lead to a flawed model.
  • Another way of generating playlists is based on low- level features that are extracted using some signal processing tools to automatically generate metadata from the content itself. This allows classification of the songs. This has the advantage that song similarities are measured using objective quantities and thus has the potential of resulting in a consistent model.
  • Two approaches for playlist generation are utilized: classification based approach and similarity measure based approach. In the first approach, a set of features are first extracted and subsequently based on these features. A model is derived and trained to perform classification and automatic labeling of songs. Once the songs are labeled, the metadata is used to generate a playlist for mixing. As mentioned above one known method is local search. The second approach is based on the similarity of songs based on a certain objective distance measure. The idea here is, given an objective distance measure and a "seed" song, similar songs are collected and sorted based on their degree of likeness.
  • Rhythmical features are obvious song features that are easy to model. These are generally clear objective concepts such as tempo, beat phase, meter and phrase boundary etc.
  • song semantics at the lowest level, there are beats of songs, in which each consecutive beat is separated by a beat period. The frequency of beats is referred to as the tempo of the song.
  • a set of beats form the meter of a song. The number of beats per meter depends on the genre of the song. In dance music for example there are four beats per meter.
  • On a higher abstraction level there is the phrase of a song. This is generally a collection of four meters and usually coincides with a start of the vocal phrase in a song.
  • the human DJ always tries to align phrase boundaries of songs to be mixed.
  • phrase detection and alignment require a deeper music feel and are often difficult to model. In such cases, meter-aligned beat mixing could be sufficient.
  • phrase alignment is ideal, meter alignment is workable and beat alignment is not sufficient for rhythmically acceptable music mixing.
  • the AutoDJ 501 comprises a first mixer 503 and a second mixer 505.
  • the pair of input terminals of each mixer are connected to respective input terminals 507a, 507b, 507c, 507d of the AutoDJ 501.
  • Each mixer 503 and 505 are connected to a control terminal 509 of the AutoDJ 501.
  • the outputs of each mixer 503, 505 is connected to respective inputs of an adder 511.
  • the output of the adder 511 is connected to the output terminal 513 of the AutoDJ 501.
  • a first pair of input terminals 507a, 507b, are connected to the output of a first low pass filter 515 and a first high pass filter 517.
  • a second pair of input terminals 507c, 507d are connected to the output of a second low pass filter 519 and a second high pass filter 521.
  • the first low pass filter 515 and first high pass filters 517 decompose the first input signal xfnj into two complementary components X L [ ⁇ ] and X H [ ⁇ ]
  • the second low pass filter 519 and the second high pass filter 521 decompose the second input signal yfnj into its two complementary components an ⁇ yjn] and ) ⁇ [nJ, respectively.
  • these are the low frequency (bass) and the high frequency (treble) components.
  • the first mixer 503 is applied to mix the corresponding frequency components of the two signals X L [ ⁇ ] and ytfn].
  • the second mixer 505 is applied to mix the corresponding frequency components of the two signals xu[n] and yu[n].
  • the transition profiles of the first and second mixers 503, 505 is similar to that shown in Figure 4.
  • the outputs z L [n] and z H [n] of the mixers 503, 505 are then summed up by the adder 511 to generate the output mixed signal zfnj which is output on the output terminal 513 of the AutoDJ 501.
  • the control signal input on the control terminal 509 of the AutoDJ 501 dictates how the mixing in the two mixers 503, 505 is performed and contains information of the location of the mixing points and the size of the overlap.
  • each with its own transition profile can be chosen.
  • the transition profile in each frequency band can vary from zero overlap to very large overlap.
  • a more detailed description of the mixers 503, 505 is provided with reference to Fig. 4. It is understood that the first and second mixer 503, 505 may be substantially similar and for simplicity only the first mixer 503 is shown and described here.
  • the first mixer 503 comprises a phase comparator 601.
  • the inputs of the phase comparator 601 are connected to the input terminals 603, 605 of the mixer 503.
  • the input terminals 603, 605 of the mixer 503 are also connected to the input of respective first and second delay elements 607, 609.
  • the delay elements 607, 609 are controlled by a control signal C generated by the phase comparator 601.
  • the output of the first delay element 607 is connected to the input a first gain element 611.
  • the output of the second delay element 609 is connected to the input of a second gain element 613.
  • the outputs of the first and second gain elements 611, 613 are connected to respective inputs of an adder 615.
  • the output of the adder 615 is connected to the output terminal 617 of the mixer 503.
  • the input signals xjn] an ⁇ yjn] are placed on the input terminals 603 and 605 respectively.
  • the phases of X L [ ⁇ ] are compared in the phase comparator 601.
  • the output of the comparator 601 is a control signal C that controls the delay elements 607, 609 so as to minimize the phase conflict during addition.
  • the delay elements 607, 609 are changed in a graceful way.
  • the gain elements 611, 613 implement a cross fading profile. In this way the issue of the phase conflicts, in this case, the bass components of the signals to be mixed is compensated.
  • the gain of the gain elements 611, 613 is controlled by the control signal input on the control terminal 509 of the AutoDJ 501.
  • Chromagram is an abstraction of the time- varying spectrum of the audio signal based on the perceptual organization of pitch, where the highly redundant octave relationships are discounted and emphasis is given to the pitch structure. Chromagram is a representation of the musical-key of a song and is based on the frequency-to-key mapping shown in Table 1.
  • the chromagram of a song is computed by taking the normalized cumulative energy of the signal for the 12 octave bins in the frequency range 0 to 20 kHz.
  • x(f) be the frequency response of the audio signal
  • / ⁇ 1 and / ⁇ 2 represent the lower and upper bounds for the j -th frequency range in the k -th octave bin.
  • / ⁇ 1 and / ⁇ 2 are such that the width of each spectral region is a half semitone around the centre frequency given in table 1. Then the k -th chromagram value ⁇ is given by
  • the chromagram x of a song W is the histogram constructed by collecting the 12 chromagram values into a single vector, namely
  • the duration of the mix (the size of the overlap) is proportional to the similarity between the chromagrams of the two songs to be mixed.
  • the similarity between the two chromagrams is defined by the Pearson product moment correlation p(x, y) of two chromagram vectors x, y as:
  • K is a constant which maps the correlations value into time duration. K is typically measured in terms of beats and may have a value in the range of 16-32 beats (i.e. a maximum overlap of 1 to 2 phrases). When the chromagrams of the songs to be mixed are far apart, the mix interval is short and a less pleasant mix may be generated. To overcome this problem, it may be desirable to bridge the two contents with artificially generated transition pattern. This pattern is generally a percussion pattern, which can fit to any chromagram.
  • the percussion pattern is referred to as a uni-key signal as it has a more or less flat spectrum with respect to the octave bins and thus gives almost a constant correlation value with all kinds of songs. Given songs with chromagram mismatch, artificial patterns are inserted between the two songs to enable a smoother transition.
  • the artificially generated transition pattern is used to bridge this mismatch.
  • the percussion pattern could be designed in such a way that its chromagram gradually transits from close to song A to close to song B or it could be chosen to have uni-key as discussed above.
  • an artificial pattern can also be used to match very different tempos and songs with entirely different styles. The latter could for instance be achieved by introducing sound effects such as breaks between the songs to be mixed.
  • the AutoDJ meta- information is derived from a database of songs.
  • the AutoDJ meta- information namely, tempo and beat onsets, bar boundaries, beat strengths, spectral shape parameters (chromagram) are computed offline and stored in a linked database as described above with reference to Fig. 1.
  • other (so-called non- AutoDJ) meta-information such as genre, era, artist and low-level features that are used for objective likeness measure (referred to as likeness-parameters) are also gathered.
  • non- AutoDJ meta-information such as genre, era, artist and low-level features that are used for objective likeness measure (referred to as likeness-parameters) are also gathered.
  • the automatically generated playlists are fed to a transition planner that analyzes the AutoDJ meta-information of the analysis areas that are used in each transition. From this analysis, it determines the preferred type of transition and the parameters required to perform it. The generated sets of commands are then executed by the player to perform the required mixes.
  • a preferred technique to optimise the order of songs in a playlist to provide the required smooth transitions is use of a local search based method.
  • HarmonicSimilarii, j) l ⁇ i ⁇ j ⁇ N, S 1 Jc ⁇ .S 7 -JV holds
  • S 1 JC represents the chromagram at mixing point x of song S 1 .
  • the mixing point may, for example, comprise the outro and the intro (say, the last and first 10 seconds) of two successive songs.
  • the approximation above between chromagrams has to be further operationalised into a penalty iunction. This penalty iunctions tells how well one chromagram approximates to the other in a numerical value ranging from
  • the problem can also be formulated as a maximization problem.
  • the method of the preferred embodiment of the present invention is to compute the chromagram for each song (or for each relevant mixing point of each song). Mixing points may be the intros and outros of songs. This computation can be either done off-line or on-line. For speed purposes, however, the computation is preferably carried out off-line.
  • the chromagram is defined as the restructuring of a spectral representation in which the frequencies are mapped onto a limited set of 12 chroma values in a many-to-one fashion. This is done by assigning frequencies to the 'bin' that represents the ideal chroma value of the equally tempered scale for that frequency, for example, as shown in Table 1 above.
  • the 'bins' correspond to the twelve chromas in an octave.
  • a sum of harmonically compressed amplitude FFT- based spectral is used in which the spectral content above 5 kHz is cut off by down- sampling the signal. It is assumed that harmonics in the higher frequency regions do not contribute significantly to the pitches in the lower frequency regions. Preferably, only a limited number of harmonically compressed spectra (say, 15) are added.
  • the spectral components i.e., the peaks
  • Spectral components at higher frequencies contribute less to pitch than spectral components at lower frequencies.
  • the frequency abscissa is transformed to a logarithmic one by means of interpolation, since human pitch perception follows logarithmic laws.
  • 171 points per octave are interpolated over 6 octaves (from 25 Hz to 5 kHz) by a cubic spline method. This is required to achieve a higher resolution for going from the linear to the logarithmic frequency domain and to prevent numerical instability.
  • a weighting function is used to model the human auditory sensitivity; the perceived loudness of a pitch depends on its frequency. For instance, the loudness of pitches with equal amplitude start to drop steeply at frequencies lower than 400 Hz.
  • Harmonic compression means that the linear frequency abscissa is multiplied by an integral factor n. In the logarithmic frequency domain, the multiplication is equal to a shaft (or addition).
  • the compression rank n refers to the index of the harmonic that will be resolved.
  • the number of compressions carried out amounts to the number of harmonics that is investigated. All these different compressed spectrum presentations are added; a sequence of decreasing factors is used to realise that higher harmonics contribute less to pitch than the lower harmonics do.
  • the input signal is partitioned in non- overlapping time frames of 100 milliseconds. If the signal is in stereo format, a mono version is created by averaging both channels first.
  • the length of a frame is inspired, on one hand, by the minimal duration of a note in a music performance with a usable global tempo (30-300 bpm; or between 5 events per second and 1 event every 2 seconds) and, on the other hand, by the fact that long frames are computationally too intensive.
  • a low-pass filtering of at least 10 kHz and a decimation process bandlimnits and downsamples the signal by a particular factor. Low-pass filtering is down by a FIR approximation of an ideal low-pass filter. This down-sampling decreases dramatically the computing time necessities without affecting results seriously.
  • the 'remaining' samples in a frame are multiplied by a Hamming window, zero-padded, and the amplitude spectrum is calculated from a 1024-point FFT.
  • This spectrum consists of 512 points spaced 4.88 Hz on a linear frequency scale.
  • a procedure is applied aiming at enhancing the peaks without seriously affecting frequencies or their magnitudes. Only values at and around the spectral peaks are taking into account by setting all values at points that are more than two FFT points (9.77 Hz) separated from a relative maximum, equal to 0.
  • the resulting spectrum is then smoothed using a Hanning filter.
  • the values of the spectrum on a logarithmic frequency scale are calculated for values of the spectrum on a logarithmic frequency scale are calculated for 171 (interpolation).
  • the interpolated spectrum is multiplied by a raised arctangent function, mimicking the sensitivity of the human auditory system for frequencies below 1250 Hz.
  • the chromagram for each frame is computed by locating the spectral regions in the harmonically compressed spectrum that correspond with each chroma in equal temperament. For the pitch class C, this comes down to the four spectral regions centred around the pitch frequencies for Cl (32.7 Hz), C2 (65.4 Hz), C3 (130.8 Hz), C4 (261.6 Hz), C5 (523.3 Hz) and C6 (1046.5 Hz). The width of each spectral region is a half semitone from this centre. The amplitudes in all four spectral regions are added to form one chroma region. Then, the norm H of the amplitudes that fall within a chroma region and the norm R of all amplitudes that do not fall within a chroma region are taken. Calculating the
  • quotient provides the likelihood of that chroma. Adding and normalizing the R chromagrams over all frames results in a chromagram for the complete music sample.
  • the preferred embodiment refers to a specific correlation technique to correlate the chromagrams at the mixing points
  • the present invention is not limited to such a technique and that it is intended that the correlation of the chromagrams at the mixing points includes any other associative or distance measure such as chi-square distance, euclidean distance, entropy measure, distribution measure or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Auxiliary Devices For Music (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
EP06796003A 2005-09-30 2006-09-12 Method and apparatus for processing audio for playback Withdrawn EP1938325A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP06796003A EP1938325A2 (en) 2005-09-30 2006-09-12 Method and apparatus for processing audio for playback

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP05109080 2005-09-30
EP06796003A EP1938325A2 (en) 2005-09-30 2006-09-12 Method and apparatus for processing audio for playback
PCT/IB2006/053230 WO2007036824A2 (en) 2005-09-30 2006-09-12 Method and apparatus for processing audio for playback

Publications (1)

Publication Number Publication Date
EP1938325A2 true EP1938325A2 (en) 2008-07-02

Family

ID=37757102

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06796003A Withdrawn EP1938325A2 (en) 2005-09-30 2006-09-12 Method and apparatus for processing audio for playback

Country Status (6)

Country Link
US (1) US8069036B2 (ja)
EP (1) EP1938325A2 (ja)
JP (1) JP2009510658A (ja)
KR (1) KR20080066007A (ja)
CN (1) CN101278349A (ja)
WO (1) WO2007036824A2 (ja)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007072350A2 (en) * 2005-12-22 2007-06-28 Koninklijke Philips Electronics N.V. Electronic device and method for determining a mixing parameter
US7888582B2 (en) * 2007-02-08 2011-02-15 Kaleidescape, Inc. Sound sequences with transitions and playlists
KR20090118752A (ko) * 2008-05-14 2009-11-18 삼성전자주식회사 컨텐트 재생 목록 제공 방법 및 그 장치
KR20100071314A (ko) * 2008-12-19 2010-06-29 삼성전자주식회사 영상처리장치 및 영상처리장치의 제어 방법
US8422699B2 (en) * 2009-04-17 2013-04-16 Linear Acoustic, Inc. Loudness consistency at program boundaries
US20110231426A1 (en) * 2010-03-22 2011-09-22 Microsoft Corporation Song transition metadata
JP5598536B2 (ja) * 2010-03-31 2014-10-01 富士通株式会社 帯域拡張装置および帯域拡張方法
US8380334B2 (en) 2010-09-07 2013-02-19 Linear Acoustic, Inc. Carrying auxiliary data within audio signals
US20130275421A1 (en) 2010-12-30 2013-10-17 Barbara Resch Repetition Detection in Media Data
EP2659483B1 (en) * 2010-12-30 2015-11-25 Dolby International AB Song transition effects for browsing
EP2485213A1 (en) * 2011-02-03 2012-08-08 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Semantic audio track mixer
US9070352B1 (en) 2011-10-25 2015-06-30 Mixwolf LLC System and method for mixing song data using measure groupings
US9111519B1 (en) 2011-10-26 2015-08-18 Mixwolf LLC System and method for generating cuepoints for mixing song data
US9576050B1 (en) * 2011-12-07 2017-02-21 Google Inc. Generating a playlist based on input acoustic information
CN105612510B (zh) * 2013-08-28 2018-11-13 兰德音频有限公司 用于使用语义数据执行自动音频制作的系统和方法
US10219029B1 (en) 2014-03-12 2019-02-26 Google Llc Determining online content insertion points in an online publication
US9269339B1 (en) * 2014-06-02 2016-02-23 Illiac Software, Inc. Automatic tonal analysis of musical scores
SE1451583A1 (en) * 2014-12-18 2016-06-19 100 Milligrams Holding Ab Computer program, apparatus and method for generating a mix of music tracks
US10147407B2 (en) 2016-08-31 2018-12-04 Gracenote, Inc. Characterizing audio using transchromagrams
US20180315407A1 (en) * 2017-04-28 2018-11-01 Microsoft Technology Licensing, Llc Automatic Music Mixing
US11443724B2 (en) * 2018-07-31 2022-09-13 Mediawave Intelligent Communication Method of synchronizing electronic interactive device
US11972746B2 (en) * 2018-09-14 2024-04-30 Bellevue Investments Gmbh & Co. Kgaa Method and system for hybrid AI-based song construction
EP4115628A1 (en) * 2020-03-06 2023-01-11 algoriddim GmbH Playback transition from first to second audio track with transition functions of decomposed signals
CN112735479B (zh) * 2021-03-31 2021-07-06 南方电网数字电网研究院有限公司 语音情绪识别方法、装置、计算机设备和存储介质

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08254985A (ja) * 1995-03-17 1996-10-01 Pioneer Electron Corp 音楽再生制御装置及び音楽再生装置
US6533969B1 (en) * 1998-06-12 2003-03-18 Jeneric/Pentron, Inc. Method of making high-strength dental restorations
US8326584B1 (en) 1999-09-14 2012-12-04 Gracenote, Inc. Music searching methods based on human perception
JP3687467B2 (ja) * 2000-02-25 2005-08-24 ティアック株式会社 記録媒体再生装置
US6542869B1 (en) * 2000-05-11 2003-04-01 Fuji Xerox Co., Ltd. Method for automatic analysis of audio including music and speech
JP3797547B2 (ja) * 2001-03-21 2006-07-19 松下電器産業株式会社 プレイリスト生成装置、オーディオ情報提供装置、オーディオ情報提供システムおよびその方法、プログラム、記録媒体
JP3780857B2 (ja) * 2001-03-26 2006-05-31 ヤマハ株式会社 波形編集方法および波形編集装置
WO2002084645A2 (en) * 2001-04-13 2002-10-24 Dolby Laboratories Licensing Corporation High quality time-scaling and pitch-scaling of audio signals
GB2378873B (en) * 2001-04-28 2003-08-06 Hewlett Packard Co Automated compilation of music
JP4646099B2 (ja) * 2001-09-28 2011-03-09 パイオニア株式会社 オーディオ情報再生装置及びオーディオ情報再生システム
JP2003177743A (ja) * 2001-12-12 2003-06-27 Yamaha Corp 自動制御装置、鍵盤楽器、楽音発生装置、自動演奏ピアノおよびプログラム
US20030205124A1 (en) * 2002-05-01 2003-11-06 Foote Jonathan T. Method and system for retrieving and sequencing music by rhythmic similarity
KR100429885B1 (ko) * 2002-05-09 2004-05-03 삼성전자주식회사 열방출 특성을 개선한 멀티 칩 패키지
JP4243682B2 (ja) * 2002-10-24 2009-03-25 独立行政法人産業技術総合研究所 音楽音響データ中のサビ区間を検出する方法及び装置並びに該方法を実行するためのプログラム
WO2004057570A1 (en) * 2002-12-20 2004-07-08 Koninklijke Philips Electronics N.V. Ordering audio signals
JP2003241800A (ja) * 2003-02-10 2003-08-29 Yamaha Corp ディジタル信号の時間軸圧伸方法及び装置
JP2005202354A (ja) * 2003-12-19 2005-07-28 Toudai Tlo Ltd 信号解析方法
DE102004047069A1 (de) * 2004-09-28 2006-04-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Ändern einer Segmentierung eines Audiostücks
EP1840871B1 (en) * 2004-12-27 2017-07-12 P Softhouse Co. Ltd. Audio waveform processing device, method, and program
JP2007041234A (ja) * 2005-08-02 2007-02-15 Univ Of Tokyo 音楽音響信号の調推定方法および調推定装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2007036824A2 *

Also Published As

Publication number Publication date
CN101278349A (zh) 2008-10-01
WO2007036824A2 (en) 2007-04-05
US8069036B2 (en) 2011-11-29
WO2007036824A3 (en) 2007-07-19
US20080221895A1 (en) 2008-09-11
JP2009510658A (ja) 2009-03-12
KR20080066007A (ko) 2008-07-15

Similar Documents

Publication Publication Date Title
US8069036B2 (en) Method and apparatus for processing audio for playback
US11461389B2 (en) Transitions between media content items
US7756874B2 (en) System and methods for providing automatic classification of media entities according to consonance properties
US7532943B2 (en) System and methods for providing automatic classification of media entities according to sonic properties
US8082279B2 (en) System and methods for providing adaptive media property classification
US7574276B2 (en) System and methods for providing automatic classification of media entities according to melodic movement properties
US7326848B2 (en) System and methods for providing automatic classification of media entities according to tempo properties
Welsh et al. Querying large collections of music for similarity
Hargreaves et al. Structural segmentation of multitrack audio
US20140277638A1 (en) System and method of predicting user audio file preferences
Nuanáin et al. Rhythmic concatenative synthesis for electronic music: techniques, implementation, and evaluation
Eronen Signal processing methods for audio classification and music content analysis
Lerch An introduction to audio content analysis: Music Information Retrieval tasks and applications
Lidy Evaluation of new audio features and their utilization in novel music retrieval applications
Dixon Analysis of musical expression in audio signals
Pope et al. Feature extraction and database design for music software
Gärtner Tempo estimation from urban music using non-negative matrix factorization
EP2355104A1 (en) Apparatus and method for processing audio data
Norowi Human-Centred Artificial Intelligence in Concatenative Sound Synthesis
Boeckling An Automatic Drum and Bass Music DJ System
Jan APPLYING CONTENT-BASED RECOMMENDATION TO PERSONAL ITUNES MUSIC LIBRARIES
Ó Nuanáin et al. Rhythmic Concatenative Synthesis for Electronic Music
Bond Unsupervised Classification of Music Signals: Strategies Using Timbre and Rhythm
Dixon Audio Analysis Applications for Music
Coppola Software-Based Signal Processing for the Search and Comparison of Music Files

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20080502

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

17Q First examination report despatched

Effective date: 20120612

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20121123